home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-07-06 | 214.9 KB | 7,853 lines | [TEXT/R*ch] |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Functional programming
-
- using Caml Light
-
-
- Michel Mauny
-
- January 1995
-
-
-
-
-
-
-
-
-
-
- Contents
-
-
-
-
- 1 Introduction 4
-
- I Functional programming 6
-
- 2 Functional languages 7
- 2.1 History of functional languages . . . . . . . . . . . . . . . . 8
- 2.2 The ML family . . . . . . . . . . . . . . . . . . . . . . . . . 9
- 2.3 The Miranda family. . . . . . . . . . . . . . . . . . . . . . . 9
-
- 3 Basic concepts 10
- 3.1 Toplevel loop . . . . . . . . . . . . . . . . . . . . . . . . . 10
- 3.2 Evaluation: from expressions to values . . . . . . . . . . . . 11
- 3.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
- 3.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
- 3.5 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 15
- 3.6 Partial applications. . . . . . . . . . . . . . . . . . . . . . 16
-
- 4 Basic types 18
- 4.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
- 4.2 Boolean values. . . . . . . . . . . . . . . . . . . . . . . . . 21
- 4.3 Strings and characters. . . . . . . . . . . . . . . . . . . . . 24
- 4.4 Tuples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
- 4.5 Patterns and pattern-matching . . . . . . . . . . . . . . . . . 25
- 4.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
-
- 5 Lists 31
- 5.1 Building lists. . . . . . . . . . . . . . . . . . . . . . . . . 31
- 5.2 Extracting elements from lists: pattern-matching . . . . . . . 32
- 5.3 Functions over lists. . . . . . . . . . . . . . . . . . . . . . 33
-
- 6 User-defined types 36
- 6.1 Product types . . . . . . . . . . . . . . . . . . . . . . . . . 36
- 6.2 Sum types . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
- 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
-
- II Caml Light specifics 47
-
- 7 Mutable data structures 48
- 7.1 User-defined mutable data structures. . . . . . . . . . . . . . 48
-
-
- 1
-
-
-
- 7.2 The ref type. . . . . . . . . . . . . . . . . . . . . . . . . . 50
- 7.3 Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
- 7.4 Loops: while and for . . . . . . . . . . . . . . . . . . . . . 52
- 7.5 Polymorphism and mutable data structures. . . . . . . . . . . . 53
-
- 8 Escaping from computations: exceptions 55
- 8.1 Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . 55
- 8.2 Raising an exception. . . . . . . . . . . . . . . . . . . . . . 56
- 8.3 Trapping exceptions . . . . . . . . . . . . . . . . . . . . . . 57
- 8.4 Polymorphism and exceptions . . . . . . . . . . . . . . . . . . 58
-
- 9 Basic input/output 60
- 9.1 Printable types . . . . . . . . . . . . . . . . . . . . . . . . 60
- 9.2 Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
- 9.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
- 9.4 Channels on files . . . . . . . . . . . . . . . . . . . . . . . 63
-
- 10 Streams and parsers 66
- 10.1 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
- 10.2 Stream matching and parsers . . . . . . . . . . . . . . . . . . 68
- 10.3 Parameterized parsers . . . . . . . . . . . . . . . . . . . . . 70
- 10.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 78
-
- 11 Standalone programs and separate compilation 79
- 11.1 Standalone programs . . . . . . . . . . . . . . . . . . . . . . 79
- 11.2 Programs in several files . . . . . . . . . . . . . . . . . . . 80
- 11.3 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 82
-
- III A complete example 84
-
- 12 ASL: A Small Language 85
- 12.1 ASL abstract syntax trees . . . . . . . . . . . . . . . . . . . 85
- 12.2 Parsing ASL programs. . . . . . . . . . . . . . . . . . . . . . 86
-
- 13 Untyped semantics of ASL programs 92
- 13.1 Semantic values . . . . . . . . . . . . . . . . . . . . . . . . 92
- 13.2 Semantic functions. . . . . . . . . . . . . . . . . . . . . . . 93
- 13.3 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
-
- 14 Encoding recursion 95
- 14.1 Fixpoint combinators. . . . . . . . . . . . . . . . . . . . . . 95
- 14.2 Recursion as a primitive construct. . . . . . . . . . . . . . . 97
-
- 15 Static typing, polymorphism and type synthesis 98
- 15.1 The type system . . . . . . . . . . . . . . . . . . . . . . . . 98
- 15.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 103
- 15.3 The ASL type-synthesizer. . . . . . . . . . . . . . . . . . . . 105
-
- 16 Compiling ASL to an abstract machine code 114
- 16.1 The Abstract Machine. . . . . . . . . . . . . . . . . . . . . . 114
- 16.2 Compiling ASL programs into CAM code. . . . . . . . . . . . . . 117
-
-
- 2
-
-
-
- 16.3 Execution of CAM code . . . . . . . . . . . . . . . . . . . . . 119
-
- 17 Answers to exercises 122
-
- 18 Conclusions and further reading 134
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 3
-
-
-
-
-
-
-
-
-
- Chapter 1
-
-
-
- Introduction
-
-
-
- This document is a tutorial introduction to functional programming, and,
- more precisely, to the usage of Caml Light. It has been used to teach Caml
- Light(1) in different universities and is intended for beginners. It
- contains numerous examples and exercises, and absolute beginners should read
- it while sitting in front of a Caml Light toplevel loop, testing examples
- and variations by themselves.
- After generalities about functional programming, some features specific to
- Caml Light are described. ML type synthesis and a simple execution model
- are presented in a complete example of prototyping a subset of ML.
- Part I (chapters 2--6) may be skipped by users familiar with ML. Users
- with experience in functional programming, but unfamiliar with the ML
- dialects may skip the very first chapters and start at chapter 6, learning
- the Caml Light syntax from the examples. Part I starts with some intuition
- about functions and types and gives an overview of ML and other functional
- languages (chapter 2). Chapter 3 outlines the interaction with the Caml
- Light toplevel loop and its basic objects. Basic types and some of their
- associated primitives are presented in chapter 4. Lists (chapter 5) and
- user-defined types (chapter 6) are structured data allowing for the
- representation of complex objects and their easy creation and
- destructuration.
- While concepts presented in part I are common (under one form or another)
- to many functional languages, part B (chapters 7--11) is dedicated to
- features specific to Caml Light: mutable data structures (chapter 7),
- exception handling (chapter 8), input/output (chapter 9) and streams and
- parsers (chapter 10) show a more imperative side of the language.
- Standalone programs and separate compilation (chapter 11) allow for modular
- programming and the creation of standalone applications. Concise examples
- of Caml Light features are to be found in this part.
- Part C (chapters 12--16) is meant for already experienced Caml Light users
- willing to know more about how the Caml Light compiler synthesizes the types
- of expression and how compilation and evaluation proceeds. Some knowledge
- about first-order unification is assumed. The presentation is rather
-
- ------------------------------
- 1. The ``Caml Strong'' version of these notes is available as an INRIA
- technical report [24].
-
-
- 4
-
-
-
- informal, and is sometimes terse (specially in the chapter about type
- synthesis). We prototype a small and simple functional language (called
- ASL): we give the complete prototype implementation, from the ASL parser to
- the symbolic execution of code. Lexing and parsing of ASL programs are
- presented in chapter 12, providing realistic usages of streams and parsers.
- Chapter 13 presents an untyped call-by-value semantics of ASL programs
- through the definition of an ASL interpreter. The encoding of recursion in
- untyped ASL is presented in chapter 14, showing the expressive power of the
- language. The type synthesis of functional programs is demonstrated in
- chapter 15, using destructive unification (on first-order terms representing
- types) as a central tool. Chapter 16 introduces the Categorical Abstract
- Machine: a simple execution model for call-by-value functional programs.
- Although the Caml Light execution model is different from the one presented
- here, an intuition about the simple compilation of functional languages can
- be found in this chapter.
-
- Warning: The programs and remarks (especially contained in parts B and C)
- might not be valid in Caml Light versions different from 0.7.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Part I
-
-
-
- Functional programming
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 6
-
-
-
-
-
-
-
-
-
- Chapter 2
-
-
-
- Functional languages
-
-
-
- Programming languages are said to be functional when the basic way of
- structuring programs is the notion of function and their essential control
- structure is function application. For example, the Lisp language [22], and
- more precisely its modern successor Scheme [31, 1], has been called
- functional because it possesses these two properties.
- However, we want the programming notion of function to be as close as
- possible to the usual mathematical notion of function. In mathematics,
- functions are ``first-class'' objects: they can be arbitrarily manipulated.
- For example, they can be composed, and the composition function is itself a
- function.
- In mathematics, one would present the successor function in the following
- way:
-
- successor: N -> N
- n |-> n+1
-
- The functional composition could be presented as:
-
- o: (A=>B)x (C=>A) -> (C=>B)
- (f,g) |-> (x |-> f (g x))
-
- where (A=>B) denotes the space of functions from A to B.
- We remark here the importance of:
-
- 1. the notion of type; a mathematical function always possesses a domain
- and a codomain. They will correspond to the programming notion of type.
-
- 2. lexical binding: when we wrote the mathematical definition of
- successor, we have assumed that the addition function + had been
- previously defined, mapping a pair of natural numbers to a natural
- number; the meaning of the successor function is defined using the
- meaning of the addition: whatever + denotes in the future, this
- successor function will remain the same.
-
- 3. the notion of functional abstraction, allowing to express the behavior
- of fog as (x |-> f (g x)), i.e. the function which, when given some x,
-
-
- 7
-
-
-
- returns f (g x).
-
- ML dialects (cf. below) respect these notions. But they also allow
- non-functional programming styles, and, in this sense, they are functional
- but not purely functional.
-
-
- 2.1 History of functional languages
-
- Some historical points:
-
- - 1930: Alonzo Church developed the l-calculus [6] as an attempt to
- provide a basis for mathematics. The l-calculus is a formal theory for
- functionality. The three basic constructs of the l-calculus are:
-
-
- - variable names (e.g. x, y,...);
-
- - application (MN if M and M are terms);
-
- - functional abstraction (lx.M).
-
-
- Terms of the l-calculus represent functions. The pure l-calculus has
- been proved inconsistent as a logical theory. Some type systems have
- been added to it in order to remedy this inconsistency.
-
- - 1958: Mac Carthy invented Lisp [22] whose programs have some
- similarities with terms of the l-calculus. Lisp dialects have been
- recently evolving in order to be closer to modern functional languages
- (Scheme), but they still do not possess a type system.
-
- - 1965: P. Landin proposed the ISWIM [18] language (for ``If You See What
- I Mean''), which is the precursor of languages of the ML family.
-
- - 1978: J. Backus introduced FP: a language of combinators [3] and a
- framework in which it is possible to reason about programs. The main
- particularity of FP programs is that they have no variable names.
-
- - 1978: R. Milner proposes a language called ML [11], intended to be the
- metalanguage of the LCF proof assistant (i.e. the language used to
- program the search of proofs). This language is inspired by ISWIM
- (close to l-calculus) and possesses an original type system.
-
- - 1985: D. Turner proposed the Miranda [36] programming language, which
- uses Milner's type system but where programs are submitted to lazy
- evaluation.
-
- Currently, the two main families of functional languages are the ML and the
- Miranda families.
-
-
-
- 8
-
-
-
- 2.2 The ML family
-
- ML languages are based on a sugared(1) version of l-calculus. Their
- evaluation regime is call-by-value (i.e. the argument is evaluated before
- being passed to a function), and they use Milner's type system.
- The LCF proof system appeared in 1972 at Stanford (Stanford LCF). It has
- been further developed at Cambridge (Cambridge LCF) where the ML language
- was added to it.
- From 1981 to 1986, a version of ML and its compiler was developed in a
- collaboration between INRIA and Cambridge by G. Cousineau, G. Huet and L.
- Paulson.
- In 1981, L. Cardelli implemented a version of ML whose compiler generated
- native machine code.
- In 1984, a committee of researchers from the universities of Edinburgh and
- Cambridge, Bell Laboratories and INRIA, headed by R. Milner worked on a new
- extended language called Standard ML [28]. This core language was completed
- by a module facility designed by D. MacQueen [23].
- Since 1984, the Caml language has been under design in a collaboration
- between INRIA and LIENS(2)). Its first release appeared in 1987. The main
- implementors of Caml were Ascander Suarez, Pierre Weis and Michel Mauny.
- In 1989 appeared Standard ML of New-Jersey, developed by Andrew Appel
- (Princeton University) and David MacQueen (Bell Laboratories).
- Caml Light is a smaller, more portable implementation of the core Caml
- language, developed by Xavier Leroy since 1990.
-
-
- 2.3 The Miranda family
-
- All languages in this family use lazy evaluation (i.e. the argument of a
- function is evaluated if and when the function needs its value---arguments
- are passed unevaluated to functions). They also use Milner's type system.
- Languages belonging to the Miranda family find their origin in the SASL
- language [34] (1976) developed by D. Turner. SASL and its successors (KRC
- [35] 1981, Miranda [36] 1985 and Haskell [15] 1990) use sets of mutually
- recursive equations as programs. These equations are written in a script
- (collection of declarations) and the user may evaluate expressions using
- values defined in this script. LML (Lazy ML) has been developed in Goteborg
- (Sweeden); its syntax is close to ML's syntax and it uses a fast execution
- model: the G-machine [16].
-
-
-
-
-
-
-
-
-
- ------------------------------
- 1. i.e. with a more user-friendly syntax.
- 2. Laboratoire d'Informatique de l'Ecole Normale Superieure, 45 Rue d'Ulm,
- 75005 Paris
-
-
- 9
-
-
-
-
-
-
-
-
-
- Chapter 3
-
-
-
- Basic concepts
-
-
-
- We examine in this chapter some fundamental concepts which we will use and
- study in the following chapters. Some of them are specific to the interface
- with the Caml language (toplevel, global definitions) while others are
- applicable to any functional language.
-
-
- 3.1 Toplevel loop
-
- The first contact with Caml is through its interactive aspect(1). When
- running Caml on a computer, we enter a toplevel loop where Caml waits for
- input from the user, and then gives a response to what has been entered.
- The beginning of a Caml Light session looks like this (assuming $ is the
- shell prompt on the host machine):
-
- $camllight
- > Caml Light version 0.6
-
- #
-
- On the PC version, the command is called caml.
- The ``#'' character is Caml's prompt. When this character appears at the
- beginning of a line in an actual Caml Light session, the toplevel loop is
- waiting for a new toplevel phrase to be entered.
- Throughout this document, the phrases starting by the # character
- represent legal input to Caml. Since this document has been pre-processed
- by Caml Light and these lines have been effectively given as input to a Caml
- Light process, the reader may reproduce by him/herself the session contained
- in each chapter (each chapter of the first two parts contains a different
- session, the third part is a single session). Lines starting with the ``>''
- character are Caml Light error messages. Lines starting by another
- character are either Caml responses or (possibly) illegal input. The input
-
- ------------------------------
- 1. Caml Light implementations also possess a batch compiler usable to
- compile files and produce standalone applications; this will be discussed in
- chapter 11.
-
-
-
- 10
-
-
-
- is printed in typewriter font (like this), and output is written using
- slanted typewriter font (like that).
- Important: Of course, when developing non-trivial programs, it is
- preferable to edit programs in files and then to include the files in the
- toplevel, instead of entering the programs directly. Furthermore, when
- debugging, it is very useful to trace some functions, to see with what
- arguments they are called, and what result they return. See the reference
- manual [20] for a description of these facilities.
-
-
- 3.2 Evaluation: from expressions to values
-
- Let us enter an arithmetic expression and see what is Caml's response:
-
- #1+2;;
- - : int = 3
-
- The expression ``1+2'' has been entered, followed by ``;;'' which represents
- the end of the current toplevel phrase. When encountering ``;;'', Caml
- enters the type-checking (more precisely type synthesis) phase, and prints
- the inferred type for the expression. Then, it compiles code for the
- expression, executes it and, finally, prints the result.
- In the previous example, the result of evaluation is printed as ``3'' and
- the type is ``int'': the type of integers.
- The process of evaluation of a Caml expression can be seen as transforming
- this expression until no further transformation is allowed. These
- transformations must preserve semantics. For example, if the expression
- ``1+2'' has the mathematical object 3 as semantics, then the result ``3''
- must have the same semantics. The different phases of the Caml evaluation
- process are:
-
- - parsing (checking the syntactic legality of input);
-
- - type synthesis;
-
- - compiling;
-
- - loading;
-
- - executing;
-
- - printing the result of execution.
-
- Let us consider another example: the application of the successor function
- to 2+3. The expression (function x -> x+1) should be read as ``the function
- that, given x, returns x+1''. The juxtaposition of this expression to (2+3)
- is function application.
-
- #(function x -> x+1) (2+3);;
- - : int = 6
-
-
-
- 11
-
-
-
- There are several ways to reduce that value before obtaining the result 6.
- Here are two of them (the expression being reduced at each step is
- underlined):
-
- (function x -> x+1) (2+3) (function x -> x+1) (2+3)
- ------ --------------------------
- | |
- (function x -> x+1) 5 (2+3) + 1
- ---------------------- ------
- | |
- 5+1 5+1
- ---- ----
- | |
- 6 6
-
- The transformations used by Caml during evaluation cannot be described in
- this chapter, since they imply knowledge about compilation of Caml programs
- and machine representation of Caml values. However, since the basic control
- structure of Caml is function application, it is quite easy to give an idea
- of the transformations involved in the Caml evaluation process by using the
- kind of rewriting we used in the last example. The evaluation of the
- (well-typed) application e1(e2), where e1 and e2 denote arbitrary
- expressions, consists in the following steps:
-
- - Evaluate e2, obtaining its value v.
-
- - Evaluate e1 until it becomes a functional value. Because of the
- well-typing hypothesis, e1 must represent a function from some type t1
- to some t2, and the type of v is t1. We will write (function x -> e)
- for the result of the evaluation of e1. It denotes the mathematical
- object usually written as:
-
-
- f: t1->t2
- x|->e (where, of course, e may depend on x)
-
-
- N.B.: We do not evaluate e before we know the value of x.
-
- - Evaluate e where v has been substituted for all occurrences of x. We
- then get the final value of the original expression. The result is of
- type t2.
-
- In the previous example, we evaluate:
-
- - 2+3 to 5;
-
- - (function x -> x+1) to itself (it is already a function body);
-
- - x+1 where 5 is substituted for x, i.e. evaluate 5+1, getting 6 as
- result.
-
- It should be noticed that Caml uses call-by-value: arguments are evaluated
- before being passed to functions. The relative evaluation order of the
-
-
- 12
-
-
-
- functional expression and of the argument expression is
- implementation-dependent, and should not be relied upon. The Caml Light
- implementation evaluates arguments before functions (right-to-left
- evaluation), as shown above; the original Caml implementation evaluates
- functions before arguments (left-to-right evaluation).
-
-
- 3.3 Types
-
- Types and their checking/synthesis are crucial to functional programming:
- they provide an indication about the correctness of programs.
- The universe of Caml values is partitioned into types. A type represents
- a collection of values. For example, int is the type of integer numbers,
- and float is the type of floating-point numbers. Truth values belong to the
- bool type. Character strings belong to the string type. Types are divided
- into two classes:
-
- - Basic types (int, bool, string, ...).
-
- - Compound types such as functional types. For example, the type of
- functions from integers to integers is denoted by int -> int. The type
- of functions from boolean values to character strings is written
- bool -> string. The type of pairs whose first component is an integer
- and whose second component is a boolean value is written int * bool.
-
- In fact, any combination of the types above (and even more!) is possible.
- This could be written as:
-
- BasicType ::= int
- | bool
- | string
-
- Type ::= BasicType
- | Type -> Type
- | Type * Type
-
- Once a Caml toplevel phrase has been entered in the computer, the Caml
- process starts analyzing it. First of all, it performs syntax analysis,
- which consists in checking whether the phrase belongs to the language or
- not. For example, here is a syntax error(2) (a parenthesis is missing):
-
- #(function x -> x+1 (2+3);;
- Toplevel input:
- >(function x -> x+1 (2+3);;
- > ^^
- Syntax error.
-
- ------------------------------
- 2. Actually, lexical analysis takes place before syntax analysis and
- lexical errors may occur, detecting for instance badly formed character
- constants.
-
-
-
- 13
-
-
-
- The carets ``^^^'' underline the location where the error was detected.
- The second analysis performed is type analysis: the system attempts to
- assign a type to each subexpression of the phrase, and to synthesize the
- type of the whole phrase. If type analysis fails (i.e. if the system is
- unable to assign a sensible type to the phrase), then a type error is
- reported and Caml waits for another input, rejecting the current phrase.
- Here is a type error (cannot add a number to a boolean):
-
- #(function x -> x+1) (2+true);;
- Toplevel input:
- >(function x -> x+1) (2+true);;
- > ^^^^
- This expression has type bool,
- but is used with type int.
-
- The rejection of ill-typed phrases is called strong typing. Compilers for
- weakly typed languages (C, for example) would instead issue a warning
- message and continue their work at the risk of getting a ``Bus error'' or
- ``Illegal instruction'' message at run-time.
- Once a sensible type has been deduced for the expression, then the
- evaluation process (compilation, loading and execution) may begin.
- Strong typing enforces writing clear and well-structured programs.
- Moreover, it adds security and increases the speed of program development,
- since most typos and many conceptual errors are trapped and signaled by the
- type analysis. Finally, well-typed programs do not need dynamic type tests
- (the addition function does not need to test whether or not its arguments
- are numbers), hence static type analysis allows for more efficient machine
- code.
-
-
- 3.4 Functions
-
- The most important kind of values in functional programming are functional
- values. Mathematically, a function f of type A->B is a rule of
- correspondence which associates with each element of type A a unique member
- of type B.
- If x denotes an element of A, then we will write f(x) for the application
- of f to x. Parentheses are often useless (they are used only for grouping
- subexpressions), so we could also write (f(x)) as well as (f((x))) or simply
- f x. The value of f x is the unique element of B associated with x by the
- rule of correspondence for f.
- The notation f(x) is the one normally employed in mathematics to denote
- functional application. However, we shall be careful not to confuse a
- function with its application. We say ``the function f with formal
- parameter x'', meaning that f has been defined by:
-
- f:x|->e
-
- or, in Caml, that the body of f is something like (function x -> ...).
- Functions are values as other values. In particular, functions may be
- passed as arguments to other functions, and/or returned as result. For
-
-
- 14
-
-
-
- example, we could write:
-
- #function f -> (function x -> (f(x+1) - 1));;
- - : (int -> int) -> int -> int = <fun>
-
- That function takes as parameter a function (let us call it f) and returns
- another function which, when given an argument (let us call it x), will
- return the predecessor of the value of the application f(x+1).
- The type of that function should be read as:
- (int -> int) -> (int -> int).
-
-
- 3.5 Definitions
-
- It is important to give names to values. We have already seen some named
- values: we called them formal parameters. In the expression
- (function x -> x+1), the name x is a formal parameter. Its name is
- irrelevant: changing it into another one does not change the meaning of the
- expression. We could have written that function (function y -> y+1).
- If now we apply this function to, say, 1+2, we will evaluate the
- expression y+1 where y is the value of 1+2. Naming y the value of 1+2 in
- y+1 is written as:
-
- #let y=1+2 in y+1;;
- - : int = 4
-
- This expression is a legal Caml phrase, and the let construct is indeed
- widely used in Caml programs.
- The let construct introduces local bindings of values to identifiers.
- They are local because the scope of y is restricted to the expression (y+1).
- The identifier y kept its previous binding (if any) after the evaluation of
- the ``let ...in . ..'' construct. We can check that y has not been
- globally defined by trying to evaluate it:
-
- #y;;
- Toplevel input:
- >y;;
- >^
- The value identifier y is unbound.
-
- Local bindings using let also introduce sharing of (possibly time-consuming)
- evaluations. When evaluating ``let x=e1 in e2'', e1 gets evaluated only
- once. All occurrences of x in e2 access the value of e1 which has been
- computed once. For example, the computation of:
-
- let big = sum_of_prime_factors 35461243
- in big+(2+big)-(4*(big+1));;
-
- will be less expensive than:
-
- (sum_of_prime_factors 35461243)
-
-
- 15
-
-
-
- + (2 + (sum_of_prime_factors 35461243))
- - (4 * (sum_of_prime_factors 35461243 + 1));;
-
- The let construct also has a global form for toplevel declarations, as in:
-
- #let successor = function x -> x+1;;
- successor : int -> int = <fun>
-
- #let square = function x -> x*x;;
- square : int -> int = <fun>
-
- When a value has been declared at toplevel, it is of course available during
- the rest of the session.
-
- #square(successor 3);;
- - : int = 16
-
- #square;;
- - : int -> int = <fun>
-
- When declaring a functional value, there are some alternative syntaxes
- available. For example we could have declared the square function by:
-
- #let square x = x*x;;
- square : int -> int = <fun>
-
- or (closer to the mathematical notation) by:
-
- #let square (x) = x*x;;
- square : int -> int = <fun>
-
- All these definitions are equivalent.
-
-
- 3.6 Partial applications
-
- A partial application is the application of a function to some but not all
- of its arguments. Consider, for example, the function f defined by:
-
- #let f x = function y -> 2*x+y;;
- f : int -> int -> int = <fun>
-
- Now, the expression f(3) denotes the function which when given an argument y
- returns the value of 2*3+y. The application f(x) is called a partial
- application, since f waits for two successive arguments, and is applied to
- only one. The value of f(x) is still a function.
- We may thus define an addition function by:
-
- #let addition x = function y -> x+y;;
- addition : int -> int -> int = <fun>
-
-
-
- 16
-
-
-
- This can also be written:
-
- #let addition x y = x+y;;
- addition : int -> int -> int = <fun>
-
- We can then define the successor function by:
-
- #let successor = addition 1;;
- successor : int -> int = <fun>
-
- Now, we may use our successor function:
-
- #successor (successor 1);;
- - : int = 3
-
-
-
- Exercises
-
- Exercise 3.1 Give examples of functions with the following types:
-
- 1. (int -> int) -> int
-
- 2. int -> (int -> int)
-
- 3. (int -> int) -> (int -> int)
-
- Exercise 3.2 We have seen that the names of formal parameters are
- meaningless. It is then possible to rename x by y in (function x -> x+x).
- What should we do in order to rename x in y in
-
- (function x -> (function y -> x+y))
-
- Hint: rename y by z first. Question: why?
-
- Exercise 3.3 Evaluate the following expressions (rewrite them until no
- longer possible):
-
- let x=1+2 in ((function y -> y+x) x);;
- let x=1+2 in ((function x -> x+x) x);;
- let f1 = function f2 -> (function x -> f2 x)
- in let g = function x -> x+1
- in f1 g 2;;
-
-
-
-
-
-
-
-
-
-
- 17
-
-
-
-
-
-
-
-
-
- Chapter 4
-
-
-
- Basic types
-
-
-
- We examine in this chapter the Caml basic types.
-
-
- 4.1 Numbers
-
- Caml Light provides two numeric types: integers (type int) and
- floating-point numbers (type float). Integers are limited to the range
- -230... 230-1. Integer arithmetic is taken modulo 231 ; that is, an
- integer operation that overflows does not raise an error, but the result
- simply wraps around. Predefined operations (functions) on integers include:
-
- + addition
- - subtraction and unary minus
- * multiplication
- / division
- mod modulo
-
- Some examples of expressions:
-
- #3 * 4 + 2;;
- - : int = 14
-
- #3 * (4 + 2);;
- - : int = 18
-
- #3 - 7 - 2;;
- - : int = -6
-
- There are precedence rules that make * bind tighter than +, and so on. In
- doubt, write extra parentheses.
- So far, we have not seen the type of these arithmetic operations. They
- all expect two integer arguments; so, they are functions taking one integer
- and returning a function from integers to integers. The (functional) value
- of such infix identifiers may be obtained by taking their prefix version.
-
- #prefix + ;;
-
-
- 18
-
-
-
- - : int -> int -> int = <fun>
-
- #prefix * ;;
- - : int -> int -> int = <fun>
-
- #prefix mod ;;
- - : int -> int -> int = <fun>
-
- As shown by their types, the infix operators +, *, ..., do not work on
- floating-point values. A separate set of floating-point arithmetic
- operations is provided:
-
- +. addition
- -. subtraction and unary minus
- *. multiplication
- /. division
- sqrt square root
- exp, log exponentiation and logarithm
- sin, cos,.. . usual trigonometric operations
-
- We have two conversion functions to go back and forth between integers and
- floating-point numbers: int_of_float and float_of_int.
-
- #1 + 2.3;;
- Toplevel input:
- >1 + 2.3;;
- > ^^^
- This expression has type float,
- but is used with type int.
-
- #float_of_int 1 +. 2.3;;
- - : float = 3.3
-
- Let us now give some examples of numerical functions. We start by
- defining some very simple functions on numbers:
-
- #let square x = x *. x;;
- square : float -> float = <fun>
-
- #square 2.0;;
- - : float = 4.0
-
- #square (2.0 /. 3.0);;
- - : float = 0.444444444444
-
- #let sum_of_squares (x,y) = square(x) +. square(y);;
- sum_of_squares : float * float -> float = <fun>
-
- #let half_pi = 3.14159 /. 2.0
- #in sum_of_squares(cos(half_pi), sin(half_pi));;
- - : float = 1.0
-
-
- 19
-
-
-
- We now develop a classical example: the computation of the root of a
- function by Newton's method. Newton's method can be stated as follows: if
- y is an approximation to a root of a function f, then:
-
- f(y)
- y -------
- f'(y)
-
- is a better approximation, where f'(y) is the derivative of f evaluated at
- y. For example, with f(x)=x2 -a, we have f'(x) =2x, and therefore:
-
- a
- f(y) y2- a y +--
- y
- y- ------=y- ------=-----
- f'(y) 2y 2
-
- We can define a function deriv for approximating the derivative of a
- function at a given point by:
-
- #let deriv f x dx = (f(x+.dx) -. f(x)) /. dx;;
- deriv : (float -> float) -> float -> float -> float = <fun>
-
- Provided dx is sufficiently small, this gives a reasonable estimate of the
- derivative of f at x.
- The following function returns the absolute value of its floating point
- number argument. It uses the ``if ...then . ..else'' conditional
- construct.
-
- #let abs x = if x >. 0.0 then x else -. x;;
- abs : float -> float = <fun>
-
- The main function, given below, uses three local functions. The first one,
- until, is an example of a recursive function: it calls itself in its body,
- and is defined using a let rec construct (rec shows that the definition is
- recursive). It takes three arguments: a predicate p on floats, a function
- change from floats to floats, and a float x. If p(x) is false, then until
- is called with last argument change(x), otherwise, x is the result of the
- whole call. We will study recursive functions more closely later. The two
- other auxiliary functions, satisfied and improve, take a float as argument
- and deliver respectively a boolean value and a float. The function
- satisfied asks wether the image of its argument by f is smaller than epsilon
- or not, thus testing wether y is close enough to a root of f. The function
- improve computes the next approximation using the formula given below. The
- three local functions are defined using a cascade of let constructs. The
- whole function is:
-
- #let newton f epsilon =
- # let rec until p change x =
- # if p(x) then x
- # else until p change (change(x)) in
- # let satisfied y = abs(f y) <. epsilon in
- # let improve y = y -. (f(y) /. (deriv f y epsilon))
- #in until satisfied improve;;
-
-
- 20
-
-
-
- newton : (float -> float) -> float -> float -> float = <fun>
-
- Some examples of equation solving:
-
- #let square_root x epsilon =
- # newton (function y -> y*.y -. x) epsilon x
- #and cubic_root x epsilon =
- # newton (function y -> y*.y*.y -. x) epsilon x;;
- square_root : float -> float -> float = <fun>
- cubic_root : float -> float -> float = <fun>
-
- #square_root 2.0 1e-5;;
- - : float = 1.41421569553
-
- #cubic_root 8.0 1e-5;;
- - : float = 2.00000000131
-
- #2.0 *. (newton cos 1e-5 1.5);;
- - : float = 3.14159265359
-
-
-
- 4.2 Boolean values
-
- The presence of the conditional construct implies the presence of boolean
- values. The type bool is composed of two values true and false.
-
- #true;;
- - : bool = true
-
- #false;;
- - : bool = false
-
- The functions with results of type bool are often called predicates. Many
- predicates are predefined in Caml. Here are some of them:
-
- #prefix <;;
- - : 'a -> 'a -> bool = <fun>
-
- #1 < 2;;
- - : bool = true
-
- #prefix <.;;
- - : float -> float -> bool = <fun>
-
- #3.14159 <. 2.718;;
- - : bool = false
-
- There exist also <=, >, >=, and similarly <=., >., >=..
-
-
-
-
- 21
-
-
-
- 4.2.1 Equality
-
- Equality has a special status in functional languages because of functional
- values. It is easy to test equality of values of base types (integers,
- floating-point numbers, booleans, ...):
-
- #1 = 2;;
- - : bool = false
-
- #false = (1>2);;
- - : bool = true
-
- But it is impossible, in the general case, to decide the equality of
- functional values. Hence, equality stops on a run-time error when it
- encounters functional values.
-
- #(fun x -> x) = (fun x -> x);;
- Uncaught exception: Invalid_argument "equal: functional value"
-
- Since equality may be used on values of any type, what is its type?
- Equality takes two arguments of the same type (whatever type it is) and
- returns a boolean value. The type of equality is a polymorphic type, i.e.
- may take several possible forms. Indeed, when testing equality of two
- numbers, then its type is int -> int -> bool, and when testing equality of
- strings, its type is string -> string -> bool.
-
- #prefix =;;
- - : 'a -> 'a -> bool = <fun>
-
- The type of equality uses type variables, written 'a, `b, etc. Any type can
- be substituted to type variables in order to produces specific instances of
- types. For example, substituting int for 'a above produces
- int -> int -> bool, which is the type of the equality function used on
- integer values.
-
- #1=2;;
- - : bool = false
-
- #(1,2) = (2,1);;
- - : bool = false
-
- #1 = (1,2);;
- Toplevel input:
- >1 = (1,2);;
- > ^^^
- This expression has type int * int,
- but is used with type int.
-
-
-
-
-
-
- 22
-
-
-
- 4.2.2 Conditional
-
- Conditional expressions are of the form ``if etest then e1 else e2'', where
- etest is an expression of type bool and e1, e2 are expressions possessing
- the same type. As an example, the logical negation could be written:
-
- #let negate a = if a then false else true;;
- negate : bool -> bool = <fun>
-
- #negate (1=2);;
- - : bool = true
-
-
- 4.2.3 Logical operators
-
- The classical logical operators are available in Caml. Disjunction and
- conjunction are respectively written or and &:
-
- #true or false;;
- - : bool = true
-
- #(1<2) & (2>1);;
- - : bool = true
-
- The operators & and or are not functions. They should not be seen as
- regular functions, since they evaluate their second argument only if it is
- needed. For example, the or operator evaluates its second operand only if
- the first one evaluates to false. The behavior of these operators may be
- described as follows:
-
- e1 or e2 is equivalent to if e1 then true else e2
- e1 & e2 is equivalent to if e1 then e2 else false
-
- Negation is predefined:
-
- #not true;;
- - : bool = false
-
- The not identifier receives a special treatment from the parser: the
- application ``not f x'' is recognized as ``not (f x)'' while ``f g x'' is
- identical to ``(f g) x''. This special treatment explains why the
- functional value associated to not can be obtained only using the prefix
- keyword:
-
- #prefix not;;
- - : bool -> bool = <fun>
-
-
-
-
-
-
-
- 23
-
-
-
- 4.3 Strings and characters
-
- String constants (type string) are written between " characters
- (double-quotes). Single-character constants (type char) are written between
- ` characters (backquotes). The most used string operation is string
- concatenation, denoted by the ^ character.
-
- #"Hello" ^ " World!";;
- - : string = "Hello World!"
-
- #prefix ^;;
- - : string -> string -> string = <fun>
-
- Characters are ASCII characters:
-
- #`a`;;
- - : char = `a`
-
- #`\065`;;
- - : char = `A`
-
- and can be built from their ASCII code as in:
-
- #char_of_int 65;;
- - : char = `A`
-
- Other operations are available (sub_string, int_of_char, etc). See the Caml
- Light reference manual [20] for complete information.
-
-
- 4.4 Tuples
-
- 4.4.1 Constructing tuples
-
- It is possible to combine values into tuples (pairs, triples, ...). The
- value constructor for tuples is the ``,'' character (the comma). We will
- often use parentheses around tuples in order to improve readability, but
- they are not strictly necessary.
-
- #1,2;;
- - : int * int = 1, 2
-
- #1,2,3,4;;
- - : int * int * int * int = 1, 2, 3, 4
-
- #let p = (1+2, 1<2);;
- p : int * bool = 3, true
-
- The infix ``*'' identifier is the type constructor for tuples. For
- instance, t1t2 corresponds to the mathematical cartesian product of types
- t1 and t2.
-
-
- 24
-
-
-
- We can build tuples from any values: the tuple value constructor is
- generic.
-
- 4.4.2 Extracting pair components
-
- Projection functions are used to extract components of tuples. For pairs,
- we have:
-
- #fst;;
- - : 'a * 'b -> 'a = <fun>
-
- #snd;;
- - : 'a * 'b -> 'b = <fun>
-
- They have polymorphic types, of course, since they may be applied to any
- kind of pair. They are predefined in the Caml system, but could be defined
- by the user (in fact, the cartesian product itself could be defined --- see
- section 6.1, dedicated to user-defined product types):
-
- #let first (x,y) = x
- #and second (x,y) = y;;
- first : 'a * 'b -> 'a = <fun>
- second : 'a * 'b -> 'b = <fun>
-
- #first p;;
- - : int = 3
-
- #second p;;
- - : bool = true
-
- We say that first is a generic value because it works uniformly on several
- kinds of arguments (provided they are pairs). We often confuse between
- ``generic'' and ``polymorphic'', saying that such value is polymorphic
- instead of generic.
-
-
- 4.5 Patterns and pattern-matching
-
- Patterns and pattern-matching play an important role in ML languages. They
- appear in all real programs and are strongly related to types (predefined or
- user-defined).
- A pattern indicates the shape of a value. Patterns are ``values with
- holes''. A single variable (formal parameter) is a pattern (with no shape
- specified: it matches any value). When a value is matched against a
- pattern (this is called pattern-matching), the pattern acts as a filter. We
- already used patterns and pattern-matching in all the functions we wrote:
- the function body (function x -> ...) does (trivial) pattern-matching.
- When applied to a value, the formal parameter x gets bound to that value.
- The function body (function (x,y) -> x+y) does also pattern-matching: when
- applied to a value (a pair, because of well-typing hypotheses), the x and y
- get bound respectively to the first and the second component of that pair.
-
-
- 25
-
-
-
- All these pattern-matching examples were trivial ones, they did not
- involve any test:
-
- - formal parameters such as x do not impose any particular shape to the
- value they are supposed to match;
-
- - pair patterns such as (x,y) always match for typing reasons (cartesian
- product is a product type).
-
- However, some types do not guarantee such a uniform shape to their values.
- Consider the bool type: an element of type bool is either true or false.
- If we impose to a value of type bool to have the shape of true, then
- pattern-matching may fail. Consider the following function:
-
- #let f = function true -> false;;
- Toplevel input:
- >let f = function true -> false;;
- > ^^^^^^^^^^^^^^^^^^^^^^
- Warning: this matching is not exhaustive.
- f : bool -> bool = <fun>
-
- The compiler warns us that the pattern-matching may fail (we did not
- consider the false case).
- Thus, if we apply f to a value that evaluates to true, pattern-matching
- will succeed (an equality test is performed during execution).
-
- #f (1<2);;
- - : bool = false
-
- But, if f's argument evaluates to false, a run-time error is reported:
-
- #f (1>2);;
- Uncaught exception: Match_failure ("", 1346, 1368)
-
- Here is a correct definition using pattern-matching:
-
- #let negate = function true -> false
- # | false -> true;;
- negate : bool -> bool = <fun>
-
- The pattern-matching has now two cases, separated by the ``|'' character.
- Cases are examined in turn, from top to bottom. An equivalent definition of
- negate would be:
-
- #let negate = function true -> false
- # | x -> true;;
- negate : bool -> bool = <fun>
-
- The second case now matches any boolean value (in fact, only false since
- true has been caught by the first match case). When the second case is
- chosen, the identifier x gets bound to the argument of negate, and could be
-
-
- 26
-
-
-
- used in the right-hand part of the match case. Since in our example we do
- not use the value of the argument in the right-hand part of the second match
- case, another equivalent definition of negate would be:
-
- #let negate = function true -> false
- # | _ -> true;;
- negate : bool -> bool = <fun>
-
- Where ``_'' acts as a formal paramenter (matches any value), but does not
- introduce any binding: it should be read as ``anything else''.
- As an example of pattern-matching, consider the following function
- definition (truth value table of implication):
-
- #let imply = function (true,true) -> true
- # | (true,false) -> false
- # | (false,true) -> true
- # | (false,false) -> true;;
- imply : bool * bool -> bool = <fun>
-
- Here is another way of defining imply, by using variables:
-
- #let imply = function (true,x) -> x
- # | (false,x) -> true;;
- imply : bool * bool -> bool = <fun>
-
- Simpler, and simpler again:
-
- #let imply = function (true,x) -> x
- # | (false,_) -> true;;
- imply : bool * bool -> bool = <fun>
-
- #let imply = function (true,false) -> false
- # | _ -> true;;
- imply : bool * bool -> bool = <fun>
-
- Pattern-matching is allowed on any type of value (non-trivial
- pattern-matching is only possible on types with data constructors).
- For example:
-
- #let is_zero = function 0 -> true | _ -> false;;
- is_zero : int -> bool = <fun>
-
- #let is_yes = function "oui" -> true
- # | "si" -> true
- # | "ya" -> true
- # | "yes" -> true
- # | _ -> false;;
- is_yes : string -> bool = <fun>
-
-
-
-
-
- 27
-
-
-
- 4.6 Functions
-
- The type constructor ``->'' is predefined and cannot be defined in ML's type
- system. We shall explore in this section some further aspects of functions
- and functional types.
-
- 4.6.1 Functional composition
-
- Functional composition is easily definable in Caml. It is of course a
- polymorphic function:
-
- #let compose f g = function x -> f (g (x));;
- compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
-
- The type of compose contains no more constraints than the ones appearing in
- the definition: in a sense, it is the most general type compatible with
- these constraints.
- These constraints are:
-
- - the codomain of g and the domain of f must be the same;
-
- - x must belong to the domain of g;
-
- - compose f g x will belong to the codomain of f.
-
- Let's see compose at work:
-
- #let succ x = x+1;;
- succ : int -> int = <fun>
-
- #compose succ list_length;;
- - : '_a list -> int = <fun>
-
- #(compose succ list_length) [1;2;3];;
- - : int = 4
-
-
- 4.6.2 Currying
-
- We can define two versions of the addition function:
-
- #let plus = function (x,y) -> x+y;;
- plus : int * int -> int = <fun>
-
- #let add = function x -> (function y -> x+y);;
- add : int -> int -> int = <fun>
-
- These two functions differ only in their way of taking their arguments. The
- first one will take an argument belonging to a cartesian product, the second
- one will take a number and return a function. The add function is said to
- be the curried version of plus (in honor of the logician Haskell Curry).
-
-
- 28
-
-
-
- The currying transformation may be written in Caml under the form of a
- higher-order function:
-
- #let curry f = function x -> (function y -> f(x,y));;
- curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c = <fun>
-
- Its inverse function may be defined by:
-
- #let uncurry f = function (x,y) -> f x y;;
- uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c = <fun>
-
- We may check the types of curry and uncurry on particular examples:
-
- #uncurry (prefix +);;
- - : int * int -> int = <fun>
-
- #uncurry (prefix ^);;
- - : string * string -> string = <fun>
-
- #curry plus;;
- - : int -> int -> int = <fun>
-
-
-
- Exercises
-
- Exercise 4.1 Define functions that compute the surface area and the volume
- of well-known geometric objects (rectangles, circles, spheres, ...).
-
- Exercise 4.2 What would happen in a language submitted to call-by-value
- with recursion if there was no conditional construct (if ... then ... else
- ...)?
-
- Exercise 4.3 Define the factorial function such that:
-
- factorial n= n*(n -1)* ... *2* 1
-
- Give two versions of factorial: recursive and tail recursive.
-
- Exercise 4.4 Define the fibonacci function that, when given a number n,
- returns the nth Fibonacci number, i.e. the nth term un of the sequence
- defined by:
-
- u1= 1
- u2= 1
- un+2= un+1+un
-
- Exercise 4.5 What are the types of the following expressions?
-
- - uncurry compose
-
-
-
- 29
-
-
-
- - compose curry uncurry
-
- - compose uncurry curry
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 30
-
-
-
-
-
-
-
-
-
- Chapter 5
-
-
-
- Lists
-
-
-
- Lists represent an important data structure, mainly because of their success
- in the Lisp language. Lists in ML are homogeneous: a list cannot contain
- elements of different types. This may be annoying to new ML users, yet
- lists are not as fundamental as in Lisp, since ML provides a facility for
- introducing new types allowing the user to define more precisely the data
- structures needed by the program (cf. chapter 6).
-
-
- 5.1 Building lists
-
- Lists are empty or non empty sequences of elements. They are built with two
- value constructors:
-
- - [], the empty list (read: nil);
-
- - ::, the non-empty list constructor (read: cons). It takes an element e
- and a list l, and builds a new list whose first element (head) is e and
- rest (tail) is l.
-
- The special syntax [e1; ...; en ] builds the list whose elements are
- e1,... ,en. It is equivalent to e1 :: (e2 :: .. .(en :: []). ..).
- We may build lists of numbers:
-
- #1::2::[];;
- - : int list = [1; 2]
-
- #[3;4;5];;
- - : int list = [3; 4; 5]
-
- #let x=2 in [1; 2; x+1; x+2];;
- - : int list = [1; 2; 3; 4]
-
- Lists of functions:
-
- #let adds =
- # let add x y = x+y
- # in [add 1; add 2; add 3];;
-
-
- 31
-
-
-
- adds : (int -> int) list = [<fun>; <fun>; <fun>]
-
- and indeed, lists of anything desired.
- We may wonder what are the types of the list (data) constructors. The
- empty list is a list of anything (since it has no element), it has thus the
- type ``list of anything''. The list constructor :: takes an element and a
- list (containing elements with the same type) and returns a new list. Here
- again, there is no type constraint on the elements considered.
-
- #[];;
- - : 'a list = []
-
- #function head -> function tail -> head::tail;;
- - : 'a -> 'a list -> 'a list = <fun>
-
- We see here that the list type is a recursive type. The :: constructor
- receives two arguments; the second argument is itself a list.
-
-
- 5.2 Extracting elements from lists: pattern-matching
-
- We know how to build lists; we now show how to test emptiness of lists
- (although the equality predicate could be used for that) and extract
- elements from lists (e.g. the first one). We have used pattern-matching on
- pairs, numbers or boolean values. The syntax of patterns also includes list
- patterns. (We will see that any data constructor can actually be used in a
- pattern.) For lists, at least two cases have to be considered (empty, non
- empty):
-
- #let is_null = function [] -> true | _ -> false;;
- is_null : 'a list -> bool = <fun>
-
- #let head = function x::_ -> x
- # | _ -> raise (Failure "head");;
- head : 'a list -> 'a = <fun>
-
- #let tail = function _::l -> l
- # | _ -> raise (Failure "tail");;
- tail : 'a list -> 'a list = <fun>
-
- The expression raise (Failure "head") will produce a run-time error when
- evaluated. In the definition of head above, we have chosen to forbid taking
- the head of an empty list. We could have chosen tail [] to evaluate to [],
- but we cannot return a value for head [] without changing the type of the
- head function.
- We say that the types list and bool are sum types, because they are
- defined with several alternatives:
-
- - a list is either [] or :: of ...
-
- - a boolean value is either true or false
-
-
- 32
-
-
-
- Lists and booleans are typical examples of sum types. Sum types are the
- only types whose values need run-time tests in order to be matched by a
- non-variable pattern (i.e. a pattern that is not a single variable).
- The cartesian product is a product type (only one alternative). Product
- types do not involve run-time tests during pattern-matching, because the
- type of their values suffices to indicate statically what their structure
- is.
-
-
- 5.3 Functions over lists
-
- We will see in this section the definition of some useful functions over
- lists. These functions are of general interest, but are not exhaustive.
- Some of them are predefined in the Caml Light system. See also [9] or [37]
- for other examples of functions over lists.
- Computation of the length of a list:
-
- #let rec length = function [] -> 0
- # | _::l -> 1 + length(l);;
- length : 'a list -> int = <fun>
-
- #length [true; false];;
- - : int = 2
-
- Concatenating two lists:
-
- #let rec append =
- # function [], l2 -> l2
- # | (x::l1), l2 -> x::(append (l1,l2));;
- append : 'a list * 'a list -> 'a list = <fun>
-
- The append function is already defined in Caml, and bound to the infix
- identifier @.
-
- #[1;2] @ [3;4];;
- - : int list = [1; 2; 3; 4]
-
- Reversing a list:
-
- #let rec rev = function [] -> []
- # | x::l -> (rev l) @ [x];;
- rev : 'a list -> 'a list = <fun>
-
- #rev [1;2;3];;
- - : int list = [3; 2; 1]
-
- The map function applies a function on all the elements of a list, and
- return the list of the function results. It demonstrates full functionality
- (it takes a function as argument), list processing, and polymorphism (note
- the sharing of type variables between the arguments of map in its type).
-
-
-
- 33
-
-
-
- #let rec map f =
- # function [] -> []
- # | x::l -> (f x)::(map f l);;
- map : ('a -> 'b) -> 'a list -> 'b list = <fun>
-
- #map (function x -> x+1) [1;2;3;4;5];;
- - : int list = [2; 3; 4; 5; 6]
-
- #map length [ [1;2;3]; [4;5]; [6]; [] ];;
- - : int list = [3; 2; 1; 0]
-
- The following function is a list iterator. It takes a function f, a base
- element a and a list [x1;...;xn]. It computes:
-
- itlist f a [x1;. ..; xn]=f (. ..(f (f a x1) x2) ...)xn
-
- For instance, when applied to the curried addition function, to the base
- element 0, and to a list of numbers, it computes the sum of all elements in
- the list. The expression:
-
- it_list (prefix +) 0 [1;2;3;4;5]
-
- evaluates to ((((0+1)+2)+3)+4)+5
-
- i.e. to 15.
-
- #let rec it_list f a =
- # function [] -> a
- # | x::l -> it_list f (f a x) l;;
- it_list : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a = <fun>
-
- #let sigma = it_list prefix + 0;;
- sigma : int list -> int = <fun>
-
- #sigma [1;2;3;4;5];;
- - : int = 15
-
- #it_list (prefix *) 1 [1;2;3;4;5];;
- - : int = 120
-
- The it_list function is one of the many ways to iterate over a list. For
- other list iteration functions, see [9].
-
-
- Exercises
-
- Exercise 5.1 Define the combine function which, when given a pair of lists,
- returns a list of pairs such that:
-
- combine ([x1;...;xn],[y1;...;yn]) = [(x1,y1);...;(xn,yn)]
-
-
-
- 34
-
-
-
- The function generates a run-time error if the argument lists do not have
- the same length.
-
- Exercise 5.2 Define a function which, when given a list, returns the list
- of all its sublists.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 35
-
-
-
-
-
-
-
-
-
- Chapter 6
-
-
-
- User-defined types
-
-
-
- The user is allowed to define his/her own data types. With this facility,
- there is no need to encode the data structures that must be manipulated by a
- program into lists (as in Lisp) or into arrays (as in Fortran).
- Furthermore, early detection of type errors is enforced, since user-defined
- data types reflect precisely the needs of the algorithms.
- Types are either:
-
- - product types,
-
- - or sum types.
-
- We have already seen examples of both kinds of types: the bool and list
- types are sum types (they contain values with different shapes and are
- defined and matched using several alternatives). The cartesian product is
- an example of a product type: we know statically the shape of values
- belonging to cartesian products.
- In this chapter, we will see how to define and use new types in Caml.
-
-
- 6.1 Product types
-
- Product types are finite labeled products of types. They are a
- generalization of cartesian product. Elements of product types are called
- records.
-
- 6.1.1 Defining product types
-
- An example: suppose we want to define a data structure containing
- information about individuals. We could define:
-
- #let jean=("Jean",23,"Student","Paris");;
- jean : string * int * string * string = "Jean", 23, "Student", "Paris"
-
- and use pattern-matching to extract any particular information about the
- person jean. The problem with such usage of cartesian product is that a
- function name_of returning the name field of a value representing an
-
-
- 36
-
-
-
- individual would have the same type as the general first projection for
- 4-tuples (and indeed would be the same function). This type is not precise
- enough since it allows for the application of the function to any 4-uple,
- and not only to values such as jean.
- Instead of using cartesian product, we define a person data type:
-
- #type person =
- # {Name:string; Age:int; Job:string; City:string};;
- Type person defined.
-
- The type person is the product of string, int, string and string. The field
- names provide type information and also documentation: it is much easier to
- understand data structures such as jean above than arbitrary tuples.
- We have labels (i.e. Name, ...) to refer to components of the products.
- The order of appearance of the products components is not relevant: labels
- are sufficient to uniquely identify the components. The Caml system finds a
- canonical order on labels to represent and print record values. The order
- is always the order which appeared in the definition of the type.
- We may now define the individual jean as:
-
- #let jean = {Job="Student"; City="Paris";
- # Name="Jean"; Age=23};;
- jean : person = {Name="Jean"; Age=23; Job="Student"; City="Paris"}
-
- This example illustrates the fact that order of labels is not relevant.
-
- 6.1.2 Extracting products components
-
- The canonical way of extracting product components is pattern-matching.
- Pattern-matching provides a way to mention the shape of values and to give
- (local) names to components of values. In the following example, we name n
- the numerical value contained in the field Age of the argument, and we
- choose to forget values contained in other fields (using the _ character).
-
- #let age_of = function
- # {Age=n; Name=_; Job=_; City=_} -> n;;
- age_of : person -> int = <fun>
-
- #age_of jean;;
- - : int = 23
-
- It is also possible to access the value of a single field, with the . (dot)
- operator:
-
- #jean.Age;;
- - : int = 23
-
- #jean.Job;;
- - : string = "Student"
-
- Labels always refer to the most recent type definition: when two record
-
-
- 37
-
-
-
- types possess some common labels, then these labels always refer to the most
- recently defined type. When using modules (see section 11.2) this problem
- arises for types defined in the same module. For types defined in different
- modules, the full name of labels (i.e. with the name of the modules
- prepended) disambiguates such situations.
-
- 6.1.3 Parameterized product types
-
- It is important to be able to define parameterized types in order to define
- generic data structures. The list type is parameterized, and this is the
- reason why we may build lists of any kind of values. If we want to define
- the cartesian product as a Caml type, we need type parameters because we
- want to be able to build cartesian product of any pair of types.
-
- #type ('a,'b) pair = {Fst:'a; Snd:'b};;
- Type pair defined.
-
- #let first x = x.Fst and second x = x.Snd;;
- first : ('a, 'b) pair -> 'a = <fun>
- second : ('a, 'b) pair -> 'b = <fun>
-
- #let p={Snd=true; Fst=1+2};;
- p : (int, bool) pair = {Fst=3; Snd=true}
-
- #first(p);;
- - : int = 3
-
- Warning: the pair type is similar to the Caml cartesian product *, but it
- is a different type.
-
- #fst p;;
- Toplevel input:
- >fst p;;
- > ^
- This expression has type (int, bool) pair,
- but is used with type 'a * 'b.
-
- Actually, any two type definitions produce different types. If we enter
- again the previous definition:
-
- #type ('a,'b) pair = {Fst:'a; Snd:'b};;
- Type pair defined.
-
- we cannot any longer extract the Fst component of x:
-
- #p.Fst;;
- Toplevel input:
- >p.Fst;;
- >^
- This expression has type (int, bool) pair,
- but is used with type ('a, 'b) pair.
-
-
- 38
-
-
-
- since the label Fst refers to the latter type pair that we defined, while
- p's type is the former pair.
-
-
- 6.2 Sum types
-
- A sum type is the finite labeled disjoint union of several types. A sum
- type definition defines a type as being the union of some other types.
-
- 6.2.1 Defining sum types
-
- Example: we want to have a type called identification whose values can be:
-
- - either strings (name of an individual),
-
- - or integers (encoding of social security number as a pair of integers).
-
- We then need a type containing both int * int and character strings. We
- also want to preserve static type-checking, we thus want to be able to
- distinguish pairs from character strings even if they are injected in order
- to form a single type.
- Here is what we would do:
-
- #type identification = Name of string
- # | SS of int * int;;
- Type identification defined.
-
- The type identification is the labeled disjoint union of string and
- int * int. The labels Name and SS are injections. They respectively inject
- int * int and string into a single type identification.
- Let us use these injections in order to build new values:
-
- #let id1 = Name "Jean";;
- id1 : identification = Name "Jean"
-
- #let id2 = SS (1670728,280305);;
- id2 : identification = SS (1670728, 280305)
-
- Values id1 and id2 belong to the same type. They may for example be put
- into lists as in:
-
- #[id1;id2];;
- - : identification list = [Name "Jean"; SS (1670728, 280305)]
-
- Injections may possess one argument (as in the example above), or none.
- The latter case corresponds(1) to enumerated types, as in Pascal. An
- example of enumerated type is:
-
- ------------------------------
- 1. In Caml Light, there is no implicit order on values of sum types.
-
-
-
- 39
-
-
-
- #type suit = Heart
- # | Diamond
- # | Club
- # | Spade;;
- Type suit defined.
-
- #Club;;
- - : suit = Club
-
- The type suit contains only 4 distinct elements. Let us continue this
- example by defining a type for cards.
-
- #type card = Ace of suit
- # | King of suit
- # | Queen of suit
- # | Jack of suit
- # | Plain of suit * int;;
- Type card defined.
-
- The type card is the disjoint union of:
-
- - suit under the injection Ace,
-
- - suit under the injection King,
-
- - suit under the injection Queen,
-
- - suit under the injection Jack,
-
- - the product of int and suit under the injection Plain.
-
- Let us build a list of cards:
-
- #let figures_of c = [Ace c; King c; Queen c; Jack c]
- #and small_cards_of c =
- # map (function n -> Plain(c,n)) [7;8;9;10];;
- figures_of : suit -> card list = <fun>
- small_cards_of : suit -> card list = <fun>
-
- #figures_of Heart;;
- - : card list = [Ace Heart; King Heart; Queen Heart; Jack Heart]
-
- #small_cards_of Spade;;
- - : card list =
- [Plain (Spade, 7); Plain (Spade, 8); Plain (Spade, 9); Plain (Spade, 10)]
-
-
- 6.2.2 Extracting sum components
-
- Of course, pattern-matching is used to extract sum components. In case of
- sum types, pattern-matching uses dynamic tests for this extraction. The
-
-
- 40
-
-
-
- next example defines a function color returning the name of the color of the
- suit argument:
-
- #let color = function Diamond -> "red"
- # | Heart -> "red"
- # | _ -> "black";;
- color : suit -> string = <fun>
-
- Let us count the values of cards (assume we are playing ``belote''):
-
- #let count trump = function
- # Ace _ -> 11
- # | King _ -> 4
- # | Queen _ -> 3
- # | Jack c -> if c = trump then 20 else 2
- # | Plain (c,10) -> 10
- # | Plain (c,9) -> if c = trump then 14 else 0
- # | _ -> 0;;
- count : suit -> card -> int = <fun>
-
-
- 6.2.3 Recursive types
-
- Some types possess a naturally recursive structure, lists, for example. It
- is also the case for tree-like structures, since trees have subtrees that
- are trees themselves.
- Let us define a type for abstract syntax trees of a simple arithmetic
- language(2). An arithmetic expression will be either a numeric constant, or
- a variable, or the addition, multiplication, difference, or division of two
- expressions.
-
- #type arith_expr = Const of int
- # | Var of string
- # | Plus of args
- # | Mult of args
- # | Minus of args
- # | Div of args
- #and args = {Arg1:arith_expr; Arg2:arith_expr};;
- Type arith_expr defined.
- Type args defined.
-
- The two types arith_expr and args are simultaneously defined, and arith_expr
- is recursive since its definition refers to args which itself refers to
- arith_expr. As an example, one could represent the abstract syntax tree of
- the arithmetic expression ``x+(3*y)'' as the Caml value:
-
- ------------------------------
- 2. Syntax trees are said to be abstract because they are pieces of abstract
- syntax contrasting with concrete syntax which is the ``string'' form of
- programs: analyzing (parsing) concrete syntax usually produces abstract
- syntax.
-
-
- 41
-
-
-
- #Plus {Arg1=Var "x";
- # Arg2=Mult{Arg1=Const 3; Arg2=Var "y"}};;
- - : arith_expr = Plus {Arg1=Var "x"; Arg2=Mult {Arg1=Const 3; Arg2=Var "y"}}
-
- The recursive definition of types may lead to types such that it is hard
- or impossible to build values of these types. For example:
-
- #type stupid = {Next:stupid};;
- Type stupid defined.
-
- Elements of this type are infinite data structures. Essentially, the only
- way to construct one is:
-
- #let rec stupid_value = {Next=stupid_value};;
- stupid_value : stupid =
- {Next=
- {Next=
- {Next=
- {Next=
- {Next=
- {Next=
- {Next=
- {Next=
- {Next={Next={Next={Next={Next={Next={Next={Next=.}}}}}}}}}}}}}}
- }}
-
- Recursive type definitions should be well-founded (i.e. possess a
- non-recursive case, or base case) in order to work well with call-by-value.
-
- 6.2.4 Parameterized sum types
-
- Sum types may also be parameterized. Here is the definition of a type
- isomorphic to the list type:
-
- #type 'a sequence = Empty
- # | Sequence of 'a * 'a sequence;;
- Type sequence defined.
-
- A more sophisticated example is the type of generic binary trees:
-
- #type ('a,'b) btree = Leaf of 'b
- # | Btree of ('a,'b) node
- #and ('a,'b) node = {Op:'a;
- # Son1: ('a,'b) btree;
- # Son2: ('a,'b) btree};;
- Type btree defined.
- Type node defined.
-
- A binary tree is either a leaf (holding values of type 'b) or a node
- composed of an operator (of type 'a) and two sons, both of them being binary
- trees.
-
-
- 42
-
-
-
- Binary trees can also be used to represent abstract trees for arithmetic
- expressions (with only binary operators and only one kind of leaves). The
- abstract syntax tree t of ``1+2'' could be defined as:
-
- #let t = Btree {Op="+"; Son1=Leaf 1; Son2=Leaf 2};;
- t : (string, int) btree = Btree {Op="+"; Son1=Leaf 1; Son2=Leaf 2}
-
- Finally, it is time to notice that pattern-matching is not restricted to
- function bodies, i.e. constructs such as:
-
- function P1 -> E1
- | .. .
- | Pn -> En
-
- but there is also a construct dedicated to pattern-matching of actual
- values:
-
- match E with P1 -> E1
- | .. .
- | Pn -> En
-
- which matches the value of the expression E against each of the patterns
- Pi, selecting the first one that matches, and giving control to the
- corresponding expression. For example, we can match the tree t previously
- defined by:
-
- #match t with Btree{Op=x; Son1=_; Son2=_} -> x
- # | Leaf l -> "No operator";;
- - : string = "+"
-
-
- 6.2.5 Data constructors and functions
-
- One may ask: ``What is the difference between a sum data constructor and a
- function?''. At first sight, they look very similar. We assimilate
- constant data constructors (such as Heart) to constants. Similarly, in Caml
- Light, sum data constructors with arguments also possess a functional type:
-
- #Ace;;
- - : suit -> card = <fun>
-
- However, a data constructor possesses particular properties that a general
- function does not possess, and it is interesting to understand these
- differences. From the mathematical point of view, a sum data constructor is
- known to be an injection while a Caml function is a general function without
- further information. A mathematical injection f:A->B admits an inverse
- function f-1 from its image f(A)# B to A.
- From the examples above, if we consider the King constructor, then:
-
- #let king c = King c;;
- king : suit -> card = <fun>
-
-
- 43
-
-
-
- king is the general function associated to the King constructor, and:
-
- #function King c -> c;;
- Toplevel input:
- >function King c -> c;;
- >^^^^^^^^^^^^^^^^^^^^
- Warning: this matching is not exhaustive.
- - : card -> suit = <fun>
-
- is the inverse function for king. It is a partial function, since
- pattern-matching may fail.
-
- 6.2.6 Degenerate cases: when sums meet products
-
- What is the status of a sum type with a single case such as:
-
- #type counter1 = Counter of int;;
- Type counter1 defined.
-
- Of course, the type counter1 is isomorphic to int. The injection
- function x -> Counter x is a total function from int to counter1. It is
- thus a bijection.
- Another way to define a type isomorphic to int would be:
-
- #type counter2 = {Counter: int};;
- Type counter2 defined.
-
- The types counter1 and counter2 are isomorphic to int. They are at the same
- time sum and product types. Their pattern-matching does not perform any
- run-time test.
- The possibility of defining arbitrary complex data types permits the easy
- manipulation of abstract syntax trees in Caml (such as the arith_expr type
- above). These abstract syntax trees are supposed to represent programs of a
- language (e.g. a language of arithmetic expressions). These kind of
- languages which are defined in Caml are called object-languages and Caml is
- said to be their metalanguage.
-
-
- 6.3 Summary
-
- - New types can be introduced in Caml.
-
- - Types may be parameterized by type variables. The syntax of type
- parameters is:
-
-
- <params> ::
- | <tvar>
- | ( <tvar> [, <tvar>]* )
-
-
-
-
- 44
-
-
-
- - Types can be recursive.
-
- - Product types:
-
-
- - Mathematical product of several types.
-
- - The construct is:
-
-
- type <params> <tname> =
- {<Field>: <type>; ...}
-
-
-
- where the <type>s may contain type variables appearing in <params>.
-
-
- - Sum types:
-
-
- - Mathematical disjoint union of several types.
-
- - The construct is:
-
-
- type <params> <tname> =
- <Injection> [of <type>] | ...
-
-
-
- where the <type>s may contain type variables appearing in <params>.
-
-
-
- Exercises
-
- Exercise 6.1 Define a function taking as argument a binary tree and
- returning a pair of lists: the first one contains all operators of the
- tree, the second one contains all its leaves.
-
- Exercise 6.2 Define a function map_btree analogous to the map function on
- lists. The function map_btree should take as arguments two functions f and
- g, and a binary tree. It should return a new binary tree whose leaves are
- the result of applying f to the leaves of the tree argument, and whose
- operators are the results of applying the g function to the operators of the
- argument.
-
- Exercise 6.3 We can associate to the list type definition an canonical
- iterator in the following way. We give a functional interpretation to the
- data constructors of the list type.
-
-
- 45
-
-
-
- We change the list constructors [] and :: respectively into a constant a
- and an operator + (used as a prefix identifier), and abstract with respect
- to these two operators, obtaining the list iterator satisfying:
-
- listit + a [] = a
- listit + a (x1::.. .::xn::[]) = x1 + (.. .+ (xn + a)... )
-
- Its Caml definition would be:
-
- #let rec list_it f a =
- # function [] -> a
- # | x::l -> f x (list_it f a l);;
- list_it : ('a -> 'b -> 'b) -> 'b -> 'a list -> 'b = <fun>
-
- As an example, the application of it_list to the functional composition and
- to its neutral element (the identity function), computes the composition of
- lists of functions (try it!).
- Define, using the same method, a canonical iterator over binary trees.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 46
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Part II
-
-
-
- Caml Light specifics
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 47
-
-
-
-
-
-
-
-
-
- Chapter 7
-
-
-
- Mutable data structures
-
-
-
- The definition of a sum or product type may be annotated to allow physical
- (destructive) update on data structures of that type. This is the main
- feature of the imperative programming style. Writing values into memory
- locations is the fundamental mechanism of imperative languages such as
- Pascal. The Lisp language, while mostly functional, also provides the
- dangerous functions rplaca and rplacd to physically modify lists. Mutable
- structures are required to implement many efficient algorithms. They are
- also very convenient to represent the current state of a state machine.
-
-
- 7.1 User-defined mutable data structures
-
- Assume we want to define a type person as in the previous chapter. Then, it
- seems natural to allow a person to change his/her age, job and the city that
- person lives in, but not his/her name. We can do this by annotating some
- labels in the type definition of person by the mutable keyword:
-
- #type person =
- # {Name: string; mutable Age: int;
- # mutable Job: string; mutable City: string};;
- Type person defined.
-
- We can build values of type person in the very same way as before:
-
- #let jean =
- # {Name="Jean"; Age=23; Job="Student"; City="Paris"};;
- jean : person = {Name="Jean"; Age=23; Job="Student"; City="Paris"}
-
- But now, the value jean may be physically modified in the fields specified
- to be mutable in the definition (and only in these fields).
- We can modify the field Field of an expression <expr1> in order to assign
- it the value of <expr2> by using the following construct:
-
- <expr1>.Field <- <expr2>
-
- For example; if we want jean to become one year older, we would write:
-
-
-
- 48
-
-
-
- #jean.Age <- jean.Age + 1;;
- - : unit = ()
-
- Now, the value jean has been modified into:
-
- #jean;;
- - : person = {Name="Jean"; Age=24; Job="Student"; City="Paris"}
-
- We may try to change the Name of jean, but we won't succeed: the typecheker
- will not allow us to do that.
-
- #jean.Name <- "Paul";;
- Toplevel input:
- >jean.Name <- "Paul";;
- >^^^^^^^^^^^^^^^^^^^
- The label Name is not mutable.
-
- It is of course possible to use such constructs in functions as in:
-
- #let get_older ({Age=n; _} as p) = p.Age <- n + 1;;
- get_older : person -> unit = <fun>
-
- In that example, we named n the current Age of the argument, but we also
- named p the argument. This is an alias pattern: it saves us the bother of
- writing:
-
- #let get_older p =
- # match p with {Age=n} -> p.Age <- n + 1;;
- get_older : person -> unit = <fun>
-
- Notice that in the two previous expressions, we did not specify all fields
- of the record p. Other examples would be:
-
- #let move p new_city = p.City <- new_city
- #and change_job p j = p.Job <- j;;
- move : person -> string -> unit = <fun>
- change_job : person -> string -> unit = <fun>
-
- #change_job jean "Teacher"; move jean "Cannes";
- #get_older jean; jean;;
- - : person = {Name="Jean"; Age=25; Job="Teacher"; City="Cannes"}
-
- We used the ``;'' character between the different changes we imposed to
- jean. This is the sequencing of evaluations: it permits to evaluate
- successively several expressions, discarding the result of each (except the
- last one). This construct becomes useful in the presence of side-effects
- such as physical modifications and input/output, since we want to explicitly
- specify the order in which they are performed.
-
-
-
-
-
- 49
-
-
-
- 7.2 The ref type
-
- The ref type is the predefined type of mutable indirection cells. It is
- present in the Caml system for reasons of compatibility with earlier
- versions of Caml. The ref type could be defined as follows (we don't use
- the ref name in the following definition because we want to preserve the
- original ref type):
-
- #type 'a reference = {mutable Ref: 'a};;
- Type reference defined.
-
- Example of building a value of type ref:
-
- #let r = ref (1+2);;
- r : int ref = ref 3
-
- The ref identifier is syntactically presented as a sum data constructor.
- The definition of r should be read as ``let r be a reference to the value of
- 1+2''. The value of r is nothing but a memory location whose contents can
- be overwritten.
- We consult a reference (i.e. read its memory location) with the ``!''
- symbol:
-
- #!r + 1;;
- - : int = 4
-
- We modify values of type ref with the := infix function:
-
- #r:=!r+1;;
- - : unit = ()
-
- #r;;
- - : int ref = ref 4
-
- Some primitives are attached to the ref type, for example:
-
- #incr;;
- - : int ref -> unit = <fun>
-
- #decr;;
- - : int ref -> unit = <fun>
-
- which increments (resp. decrements) references on integers.
-
-
- 7.3 Arrays
-
- Arrays are modifiable data structures. They belong to the parameterized
- type 'a vect. Array expressions are bracketed by [| and |], and elements
- are separated by semicolons:
-
-
-
- 50
-
-
-
- #let a = [| 10; 20; 30 |];;
- a : int vect = [|10; 20; 30|]
-
- The length of an array is returned by with the function vect_length:
-
- #vect_length a;;
- - : int = 3
-
-
- 7.3.1 Accessing array elements
-
- Accesses to array elements can be done using the following syntax:
-
- #a.(0);;
- - : int = 10
-
- or, more generally: e1.(e2), where e1 evaluates to an array and e2 to an
- integer. Alternatively, the function vect_item is provided:
-
- #vect_item;;
- - : 'a vect -> int -> 'a = <fun>
-
- The first element of an array is at index 0. Arrays are useful because
- accessing an element is done in constant time: an array is a contiguous
- fragment of memory, while accessing list elements takes linear time.
-
- 7.3.2 Modifying array elements
-
- Modification of an array element is done with the construct:
-
- e1.(e2) <- e3
-
- where e3 has the same type as the elements of the array e1. The expression
- e2 computes the index at which the modification will occur.
- As for accessing, a function for modifying array elements is also
- provided:
-
- #vect_assign;;
- - : 'a vect -> int -> 'a -> unit = <fun>
-
- For example:
-
- #a.(0) <- (a.(0)-1);;
- - : unit = ()
-
- #a;;
- - : int vect = [|9; 20; 30|]
-
- #vect_assign a 0 ((vect_item a 0) - 1);;
- - : unit = ()
-
-
-
- 51
-
-
-
- #a;;
- - : int vect = [|8; 20; 30|]
-
-
-
- 7.4 Loops: while and for
-
- Imperative programming (i.e. using side-effects such as physical
- modification of data structures) traditionally makes use of sequences and
- explicit loops. Sequencing evaluation in Caml Light is done by using the
- semicolon ``;''. Evaluating expression e1, discarding the value returned,
- and then evaluating e2 is written:
-
- e1 ; e2
-
- If e1 and e2 perform side-effects, this construct ensures that they will be
- performed in the specified order (from left to right). In order to
- emphasize sequential side-effects, instead of using parentheses around
- sequences, one can use begin and end, as in:
-
- #let x = ref 1 in
- # begin
- # x := !x + 1;
- # x := !x * !x
- # end;;
- - : unit = ()
-
- The keywords begin and end are equivalent to opening and closing
- parentheses. The program above could be written as:
-
- #let x = ref 1 in
- # (x := !x + 1; x := !x * !x);;
- - : unit = ()
-
- Explicit loops are not strictly necessary per se: a recursive function
- could perform the same work. However, the usage of an explicit loop locally
- emphasizes a more imperative style. Two loops are provided:
-
- - while: while e1 do e2 done evaluates e1 which must return a boolean
- expression, if e1 return true, then e2 (which is usually a sequence) is
- evaluated, then e1 is evaluated again and so on until e1 returns false.
-
- - for: two variants, increasing and decreasing
-
-
- - for v=e1 to e2 do e3 done
-
- - for v=e1 downto e2 do e3 done
-
-
-
-
-
- 52
-
-
-
- where v is an identifier. Expressions e1 and e2 are the bounds of the
- loop: they must evaluate to integers. In the case of the increasing
- loop, the expressions e1 and e2 are evaluated producing values n1 and
- n2 : if n1 is strictly greater than n2, then nothing is done.
- Otherwise, e3 is evaluated (n2-n1 )+1 times, with the variable v bound
- successively to n1, n1+1, . .., n2.
-
- The behavior of the decreasing loop is similar: if n1 is strictly
- smaller than n2, then nothing is done. Otherwise, e3 is evaluated
- (n1-n2) +1 times with v bound to successive values decreasing from n1
- to n2.
-
- Both loops return the value () of type unit.
-
- #for i=0 to (vect_length a) - 1 do a.(i) <- i done;;
- - : unit = ()
-
- #a;;
- - : int vect = [|0; 1; 2|]
-
-
-
- 7.5 Polymorphism and mutable data structures
-
- There are some restrictions concerning polymorphism and mutable data
- structures. One cannot enclose polymorphic objects inside mutable data
- structures.
-
- #let r = ref [];;
- r : '_a list ref = ref []
-
- The reason is that once the type of r, ('a list) ref, has been computed, it
- cannot be changed. But the value of r can be changed: we could write:
-
- r:=[1;2];;
-
- and now, r would be a reference on a list of numbers while its type would
- still be ('a list) ref, allowing us to write:
-
- r:= true::!r;;
-
- making r a reference on [true; 1; 2], which is an illegal Caml object.
- Thus the Caml typechecker imposes that modifiable data structures
- appearing at toplevel must possess monomorphic types (i.e. not
- polymorphic).
-
-
- Exercises
-
- Exercise 7.1 Give a mutable data type defining the Lisp type of lists and
- define the four functions car, cdr, rplaca and rplacd.
-
-
-
- 53
-
-
-
- Exercise 7.2 Define a stamp function, of type unit -> int, that returns a
- fresh integer each time it is called. That is, the first call returns 1;
- the second call returns 2; and so on.
-
- Exercise 7.3 Define a quick_sort function on arrays of floating point
- numbers following the quicksort algorithm [13]. Information about the
- quicksort algorithm can be found in [33], for example.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 54
-
-
-
-
-
-
-
-
-
- Chapter 8
-
-
-
- Escaping from computations: exceptions
-
-
-
- In some situations, it is necessary to abort computations. If we are trying
- to compute the integer division of an integer n by 0, then we must escape
- from that embarrassing situation without returning any result.
- Another example of the usage of such an escape mechanism appears when we
- want to define the head function on lists:
-
- #let head = function
- # x::L -> x
- # | [] -> raise (Failure "head: empty list");;
- Toplevel input:
- > x::L -> x
- > ^
- Warning: the variable L starts with an upper case letter in this pattern.
- head : 'a list -> 'a = <fun>
-
- We cannot give a regular value to the expression head [] without losing the
- polymorphism of head. We thus choose to escape: we raise an exception.
-
-
- 8.1 Exceptions
-
- An exception is a Caml value of the built-in type exn, very similar to a sum
- type. An exception:
-
- - has a name (Failure in our example),
-
- - and holds zero or one value ("head: empty list" of type string in the
- example).
-
- Some exceptions are predefined, like Failure. New exceptions can be
- defined with the following construct:
-
- exception <exception name> [of <type>]
-
- Example:
-
- #exception Found of int;;
-
-
- 55
-
-
-
- Exception Found defined.
-
- The exception Found has been declared, and it carries integer values. When
- we apply it to an integer, we get an exception value (of type exn):
-
- #Found 5;;
- - : exn = Found 5
-
-
-
- 8.2 Raising an exception
-
- Raising an exception is done by applying the primitive function raise to a
- value of type exn:
-
- #raise;;
- - : exn -> 'a = <fun>
-
- #raise (Found 5);;
- Uncaught exception: Found 5
-
- Here is a more realistic example:
-
- #let find_index p =
- # let rec find n =
- # function [] -> raise (Failure "not found")
- # | x::L -> if p(x) then raise (Found n)
- # else find (n+1) L
- # in find 1;;
- Toplevel input:
- > | x::L -> if p(x) then raise (Found n)
- > ^
- Warning: the variable L starts with an upper case letter in this pattern.
- find_index : ('a -> bool) -> 'a list -> 'b = <fun>
-
- The find_index function always fails. It raises:
-
- - Found n, if there is an element x of the list such that p(x), in this
- case n is the index of x in the list,
-
- - the Failure exception if no such x has been found.
-
- Raising exceptions is more than an error mechanism: it is a programmable
- control structure. In the find_index example, there was no error when
- raising the Found exception: we only wanted to quickly escape from the
- computation, since we found what we were looking for. This is why it must
- be possible to trap exceptions: we want to trap possible errors, but we
- also want to get our result in the case of the find_index function.
-
-
-
-
-
- 56
-
-
-
- 8.3 Trapping exceptions
-
- Trapping exceptions is achieved by the following construct:
-
- try <expression> with <match cases>
-
- This construct evaluates <expression>. If no exception is raised during
- the evaluation, then the result of the try construct is the result of
- <expression>. If an exception is raised during this evaluation, then the
- raised exception is matched against the <match cases>. If a case matches,
- then control is passed to it. If no case matches, then the exception is
- propagated outside of the try construct, looking for the enclosing try.
- Example:
-
- #let find_index p L =
- # let rec find n =
- # function [] -> raise (Failure "not found")
- # | x::L -> if p(x) then raise (Found n)
- # else find (n+1) L
- # in
- # try find 1 L with Found n -> n;;
- Toplevel input:
- >let find_index p L =
- > ^
- Warning: the variable L starts with an upper case letter in this pattern.
- Toplevel input:
- > | x::L -> if p(x) then raise (Found n)
- > ^
- Warning: the variable L starts with an upper case letter in this pattern.
- find_index : ('a -> bool) -> 'a list -> int = <fun>
-
- #find_index (function n -> (n mod 2) = 0) [1;3;5;7;9;10];;
- - : int = 6
-
- #find_index (function n -> (n mod 2) = 0) [1;3;5;7;9];;
- Uncaught exception: Failure "not found"
-
- The <match cases> part of the try construct is a regular pattern matching on
- values of type exn. It is thus possible to trap any exception by using the
- _ symbol. As an example, the following function traps any exception raised
- during the application of its two arguments. Warning: the _ will also trap
- interrupts from the keyboard such as control-C!
-
- #let catch_all f arg default =
- # try f(arg) with _ -> default;;
- catch_all : ('a -> 'b) -> 'a -> 'b -> 'b = <fun>
-
- It is even possible to catch all exceptions, do something special (close or
- remove opened files, for example), and raise again that exception, to
- propagate it upwards.
-
-
-
- 57
-
-
-
- #let show_exceptions f arg =
- # try f(arg) with x -> print_string "Exception raised!\n"; raise x;;
- show_exceptions : ('a -> 'b) -> 'a -> 'b = <fun>
-
- In the example above, we print a message to the standard output channel (the
- terminal), before raising again the trapped exception.
-
- #catch_all (function x -> raise (Failure "foo")) 1 0;;
- - : int = 0
-
- #catch_all (show_exceptions (function x -> raise (Failure "foo"))) 1 0;;
- Exception raised!
- - : int = 0
-
-
-
- 8.4 Polymorphism and exceptions
-
- Exceptions must not be polymorphic for a reason similar to the one for
- references (although it is a bit harder to give an example).
-
- #exception Exc of 'a list;;
- Toplevel input:
- >exception Exc of 'a list;;
- > ^^
- The type variable a is unbound.
-
- One reason is that the excn type is not a parameterized type, but one deeper
- reason is that if the exception Exc is declared to be polymorphic, then a
- function may raise Exc [1;2]. There might be no mention of that fact in the
- type inferred for the function. Then, another function may trap that
- exception, obtaining the value [1;2] whose real type is int list. But the
- only type known by the typechecker is 'a list: the try form should refer to
- the Exc data constructor, which is known to be polymorphic. It may then be
- possible to build an ill-typed Caml value [true; 1; 2], since the
- typechecker does not possess any further type information than 'a list.
- The problem is thus the absence of static connection from exceptions that
- are raised and the occurrences where they are trapped. Another example
- would be the one of a function raising Exc with an integer or a boolean
- value, depending on some condition. Then, in that case, when trying to trap
- these exceptions, we cannot decide wether they will hold integers or boolean
- values.
-
-
- Exercises
-
- Exercise 8.1 Define the function find_succeed which given a function f and
- a list L returns the first element of L on which the application of f
- succeeds.
-
- Exercise 8.2 Define the function map_succeed which given a function f and a
- list L returns the list of the results of successful applications of f to
-
-
- 58
-
-
-
- elements of L.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 59
-
-
-
-
-
-
-
-
-
- Chapter 9
-
-
-
- Basic input/output
-
-
-
- We describe in this chapter the Caml Light input/output model and some of
- its primitive operations. More complete information about IO can be found
- in the Caml Light manual [20].
- Caml Light has an imperative input/output model: an IO operation should
- be considered as a side-effect, and is thus dependent on the order of
- evaluation. IOs are performed onto channels with types in_channel and
- out_channel. These types are abstract, i.e. their representation is not
- accessible.
- Three channels are predefined:
-
- #std_in;;
- - : in_channel = <abstr>
-
- #std_out;;
- - : out_channel = <abstr>
-
- #std_err;;
- - : out_channel = <abstr>
-
- They are the ``standard'' IO channels: std_in is usually connected to the
- keyboard, and printing onto std_out and std_err usually appears on the
- screen.
-
-
- 9.1 Printable types
-
- It is not possible to print and read every value. Functions, for example,
- are typically not readable, unless a suitable string representation is
- designed and reading such a representation is followed by an interpretation
- computing the desired function.
- We call printable type a type for which there are input/output primitives
- implemented in Caml Light. The main printable types are:
-
- - characters: type char;
-
- - strings: type string;
-
-
-
- 60
-
-
-
- - integers: type int;
-
- - floating point numbers: type float.
-
- We know all these types from the previous chapters. Strings and characters
- support a notation for escaping to ASCII codes or to denote special
- characters such as newline:
-
- #`A`;;
- - : char = `A`
-
- #`\065`;;
- - : char = `A`
-
- #`\\`;;
- - : char = `\\`
-
- #`\n`;;
- - : char = `\n`
-
- #"string with\na newline inside";;
- - : string = "string with\na newline inside"
-
- The ``\'' character is used as an escape and is useful for non-printable or
- special characters.
- Of course, character constants can be used as constant patterns:
-
- #function `a` -> 0 | _ -> 1;;
- - : char -> int = <fun>
-
- On types such as char that have a finite number of constant elements, it may
- be useful to use or-patterns, gathering constants in the same matching rule:
-
- #let is_vowel = function
- # `a` | `e` | `i` | `o` | `u` | `y` -> true
- #| _ -> false;;
- is_vowel : char -> bool = <fun>
-
- The first rule is chosen if the argument matches one of the cases. Since
- there is a total ordering on characters, the syntax of character patterns is
- enriched with a ``..'' notation:
-
- #let is_lower_case_letter = function
- # `a`..`z` -> true
- #| _ -> false;;
- is_lower_case_letter : char -> bool = <fun>
-
- Of course, or-patterns and this notation can be mixed, as in:
-
- #let is_letter = function
- # `a`..`z` | `A`..`Z` -> true
-
-
- 61
-
-
-
- #| _ -> false;;
- is_letter : char -> bool = <fun>
-
- In the next sections, we give the most commonly used IO primitives on
- these printable types. A complete listing of predefined IO operations is
- given in [20].
-
-
- 9.2 Output
-
- Printing on standard output is performed by the following functions:
-
- #print_char;;
- - : char -> unit = <fun>
-
- #print_string;;
- - : string -> unit = <fun>
-
- #print_int;;
- - : int -> unit = <fun>
-
- #print_float;;
- - : float -> unit = <fun>
-
- Printing is buffered, i.e. the effect of a call to a printing function may
- not be seen immediately: flushing explicitly the output buffer is sometimes
- required, unless a printing function flushes it implicitly. Flushing is
- done with the flush function:
-
- #flush;;
- - : out_channel -> unit = <fun>
-
- #print_string "Hello!"; flush std_out;;
- Hello!- : unit = ()
-
- The print_newline function prints a newline character and flushes the
- standard output:
-
- #print_newline;;
- - : unit -> unit = <fun>
-
- Flushing is required when writing standalone applications, in which the
- application may terminate without all printing being done. Standalone
- applications should terminate by a call to the exit function (from the io
- module), which flushes all pending output on std_out and std_err.
- Printing on the standard error channel std_err is done with the following
- functions:
-
- #prerr_char;;
- - : char -> unit = <fun>
-
-
-
- 62
-
-
-
- #prerr_string;;
- - : string -> unit = <fun>
-
- #prerr_int;;
- - : int -> unit = <fun>
-
- #prerr_float;;
- - : float -> unit = <fun>
-
- The following function prints its string argument followed by a newline
- character to std_err and then flushes std_err.
-
- #prerr_endline;;
- - : string -> unit = <fun>
-
-
-
- 9.3 Input
-
- These input primitives flush the standard output and read from the standard
- input:
-
- #read_line;;
- - : unit -> string = <fun>
-
- #read_int;;
- - : unit -> int = <fun>
-
- #read_float;;
- - : unit -> float = <fun>
-
- Because of their names and types, these functions do not need further
- explanation.
-
-
- 9.4 Channels on files
-
- When programs have to read from or print to files, it is necessary to open
- and close channels on these files.
-
- 9.4.1 Opening and closing channels
-
- Opening and closing is performed with the following functions:
-
- #open_in;;
- - : string -> in_channel = <fun>
-
- #open_out;;
- - : string -> out_channel = <fun>
-
- #close_in;;
- - : in_channel -> unit = <fun>
-
-
- 63
-
-
-
- #close_out;;
- - : out_channel -> unit = <fun>
-
- The open_in function checks the existence of its filename argument, and
- returns a new input channel on that file; open_out creates a new file (or
- truncates it to zero length if it exists) and returns an output channel on
- that file. Both functions fail if permissions are not sufficient for
- reading or writing.
- Warning:
-
- - Closing functions close their channel argument. Since their behavior is
- unspecified on already closed channels, anything can happen in this
- case!
-
- - Closing one of the standard IO channels (std_in, std_out, std_err) have
- unpredictable effects!
-
- 9.4.2 Reading or writing from/to specified channels
-
- Some of the functions on standard input/output have corresponding functions
- working on channels:
-
- #output_char;;
- - : out_channel -> char -> unit = <fun>
-
- #output_string;;
- - : out_channel -> string -> unit = <fun>
-
- #input_char;;
- - : in_channel -> char = <fun>
-
- #input_line;;
- - : in_channel -> string = <fun>
-
-
- 9.4.3 Failures
-
- The exception End_of_file is raised when an input operation cannot complete
- because the end of the file has been reached.
-
- #End_of_file;;
- - : exn = End_of_file
-
- The exception sys__Sys_error (Sys_error from the module sys) is raised
- when some manipulation of files is forbidden by the operating system:
-
- #open_in "abracadabra";;
- Uncaught exception: sys__Sys_error "abracadabra: No such file or directory"
-
- The functions that we have seen in this chapter are sufficient for our
- needs. Many more exist (useful mainly when working with files) and are
-
-
- 64
-
-
-
- described in [20].
-
-
- Exercises
-
- Exercise 9.1 Define a function copy_file taking two filenames (of type
- string) as arguments, and copying the contents of the first file on the
- second one. Error messages must be printed on std_err.
-
- Exercise 9.2 Define a function wc taking a filename as argument and
- printing on the standard output the number of characters and lines appearing
- in the file. Error messages must be printed on std_err.
-
- Note: it is good practice to develop a program in defining small functions.
- A single function doing the whole work is usually harder to debug and to
- read. With small functions, one can trace them and see the arguments they
- are called on and the result they produce.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 65
-
-
-
-
-
-
-
-
-
- Chapter 10
-
-
-
- Streams and parsers
-
-
-
- In the next part of these course notes, we will implement a small functional
- language. Parsing valid programs of this language requires writing a
- lexical analyzer and a parser for the language. For the purpose of writing
- easily such programs, Caml Light provides a special data structure:
- streams. Their main usage is to be interfaced to input channels or strings
- and to be matched against stream patterns.
-
-
- 10.1 Streams
-
- Streams belong to an abstract data type: their actual representation
- remains hidden from the user. However, it is still possible to build
- streams either ``by hand'' or by using some predefined functions.
-
- 10.1.1 The stream type
-
- The type stream is a parameterized type. One can build streams of integers,
- of characters or of any other type. Streams receive a special syntax,
- looking like the one for lists. The empty stream is written:
-
- #[< >];;
- - : '_a stream = <abstr>
-
- A non empty stream possesses elements that are written preceded by the ``'''
- (quote) character.
-
- #[< '0; '1; '2 >];;
- - : int stream = <abstr>
-
- Elements that are not preceded by ``''' are substreams that are expanded in
- the enclosing stream:
-
- #[< '0; [<'1;'2>]; '3 >];;
- - : int stream = <abstr>
-
- #let s = [< '"abc" >] in [< s; '"def" >];;
-
-
- 66
-
-
-
- - : string stream = <abstr>
-
- Thus, stream concatenation can be defined as:
-
- #let stream_concat s t = [< s; t >];;
- stream_concat : 'a stream -> 'a stream -> 'a stream = <fun>
-
- Building streams in this way can be useful while testing a parsing function
- or defining a lexical analyzer (taking as argument a stream of characters
- and returning a stream of tokens). Stream concatenation does not copy
- substreams: they are simply put in the same stream. Since (as we will see
- later) stream matching has a destructive effect on streams (streams are
- physically ``eaten'' by stream matching), parsing [< t; t >] will in fact
- parse t only once: the first occurrence of t will be consumed, and the
- second occurrence will be empty before its parsing will be performed.
- Interfacing streams with an input channel can be done with the function:
-
- #stream_of_channel;;
- - : in_channel -> char stream = <fun>
-
- returning a stream of characters which are read from the channel argument.
- The end of stream will coincide with the end of the file associated to the
- channel.
- In the same way, one can build the character stream associated to a
- character string using:
-
- #stream_of_string;;
- - : string -> char stream = <fun>
-
- #let s = stream_of_string "abc";;
- s : char stream = <abstr>
-
-
- 10.1.2 Streams are lazily evaluated
-
- Stream expressions are submitted to lazy evaluation, i.e. they are
- effectively build only when required. This is useful in that it allows for
- the easy manipulation of ``interactive'' streams like the stream built from
- the standard input. If this was not the case, i.e. if streams were
- immediately completely computed, a program evaluating
- ``stream_of_channel std_in'' would read everything up to an end-of-file on
- standard input before giving control to the rest of the program.
- Furthermore, lazy evaluation of streams allows for the manipulation of
- infinite streams. As an example, we can build the infinite stream of
- integers, using side effects to show precisely when computations occur:
-
- #let rec ints_from n =
- # [< '(print_int n; print_char ` `; flush std_out; n);
- # ints_from (n+1) >];;
- ints_from : int -> int stream = <fun>
-
-
-
- 67
-
-
-
- #let ints = ints_from 0;;
- ints : int stream = <abstr>
-
- We notice that no printing occurred and that the program terminates: this
- shows that none of the elements have been evaluated and that the infinite
- stream has not been built. We will see in the next section that these
- side-effects will occur on demand, i.e. when tests will be needed by a
- matching function on streams.
-
-
- 10.2 Stream matching and parsers
-
- The syntax for building streams can be used for pattern-matching over them.
- However, stream matching is more complex than the usual pattern matching.
-
- 10.2.1 Stream matching is destructive
-
- Let us start with a simple example:
-
- #let next = function [< 'x >] -> x;;
- next : 'a stream -> 'a = <fun>
-
- The next function returns the first element of its stream argument, and
- fails if the stream is empty:
-
- #let s = [< '0; '1; '2 >];;
- s : int stream = <abstr>
-
- #next s;;
- - : int = 0
-
- #next s;;
- - : int = 1
-
- #next s;;
- - : int = 2
-
- #next s;;
- Uncaught exception: Parse_failure
-
- We can see from the previous examples that the stream pattern [< 'x >]
- matches an initial segment of the stream. Such a pattern must be read as
- ``the stream whose first element matches x''. Furthermore, once stream
- matching has succeeded, the stream argument has been physically modified and
- does not contain any longer the part that has been recognized by the next
- function.
- If we come back to the infinite stream of integers, we can see that the
- calls to next provoke the evaluation of the necessary part of the stream:
-
- #next ints; next ints; next ints;;
- 0 1 2 - : int = 2
-
-
- 68
-
-
-
- Thus, successive calls to next remove the first elements of the stream until
- it becomes empty. Then, next fails when applied to the empty stream, since,
- in the definition of next, there is no stream pattern that matches an
- initial segment of the empty stream.
- It is of course possible to specify several stream patterns as in:
-
- #let next = function
- # [< 'x >] -> x
- #| [< >] -> raise (Failure "empty");;
- next : 'a stream -> 'a = <fun>
-
- Cases are tried in turn, from top to bottom.
- Stream pattern components are not restricted to quoted patterns (intended
- to match stream elements), but can be also function calls (corresponding to
- non-terminals, in the grammar terminology). Functions appearing as stream
- pattern components are intended to match substreams of the stream argument:
- they are called on the actual stream argument, and they are followed by a
- pattern which should match the result of this call. For example, if we
- define a parser recognizing a non empty sequence of characters `a`:
-
- #let seq_a =
- # let rec seq = function
- # [< '`a`; seq l >] -> `a`::l
- # | [< >] -> []
- # in function [< '`a`; seq l >] -> `a`::l;;
- seq_a : char stream -> char list = <fun>
-
- we used the recursively defined function seq inside the stream pattern of
- the first rule. This definition should be read as:
-
- - if the stream is not empty and if its first element matches `a`, apply
- seq to the rest of the stream, let l be the result of this call and
- return `a`::l,
-
- - otherwise, fail (raise Parse_failure);
-
- and seq should be read in the same way (except that, since it recognizes
- possibly empty sequences of `a`, it never fails).
- Less operationally, we can read it as: ``a non-empty sequence of `a`
- starts with an `a`, and is followed by a possibly empty sequence of `a`.
- Another example is the recognition of a non-empty sequence of `a` followed
- by a `b`, or a `b` alone:
-
- #let seq_a_b = function
- # [< seq_a l; '`b` >] -> l@[`b`]
- #| [< '`b` >] -> [`b`];;
- seq_a_b : char stream -> char list = <fun>
-
- Here, operationally, once an `a` has been recognized, the first matching
- rule is chosen. Any further mismatch (either from seq_a or from the last
- `b`) will raise a Parse_error exception, and the whole parsing will fail.
-
-
- 69
-
-
-
- On the other hand, if the first character is not an `a`, seq_a will raise
- Parse_failure, and the second rule ([< '`b` >] -> ...) will be tried.
- This behavior is typical of predictive parsers. Predictive parsing is
- recursive-descent parsing with one look-ahead token. In other words, a
- predictive parser is a set of (possibly mutually recursive) procedures,
- which are selected according to the shape of (at most) the first token.
-
- 10.2.2 Sequential binding in stream patterns
-
- Bindings in stream patterns occur sequentially, in contrast with bindings in
- regular patterns, which can be thought as occurring in parallel. Stream
- matching is guaranteed to be performed from left to right. For example,
- computing the sum of the elements of an integer stream could be defined as:
-
- #let rec stream_sum n = function
- # [< '0; (stream_sum n) p >] -> p
- #| [< 'x; (stream_sum (n+x)) p >] -> p
- #| [< >] -> n;;
- stream_sum : int -> int stream -> int = <fun>
-
- #stream_sum 0 [< '0; '1; '2; '3; '4 >];;
- - : int = 10
-
- The stream_sum function uses its first argument as an accumulator holding
- the sum computed so far. The call (stream_sum (n+x)) uses x which was bound
- in the stream pattern component occurring at the left of the call.
- Warning: streams patterns are legal only in the function and match
- constructs. The let and other forms are restricted to usual patterns.
- Furthermore, a stream pattern cannot appear inside another pattern.
-
-
- 10.3 Parameterized parsers
-
- Since a parser is a function like any other function, it can be
- parameterized or be used as a parameter. Parameters used only in the
- right-hand side of stream-matching rules simulate inherited attributes of
- attribute grammars. Parameters used as parsers in stream patterns allow for
- the implementation of higher-order parsers. We will use the next example to
- motivate the introduction of parameterized parsers.
-
- 10.3.1 Example: a parser for arithmetic expressions
-
- Before building a parser for arithmetic expressions, we need a lexical
- analyzer able to recognize arithmetic operations and integer constants. Let
- us first define a type for tokens:
-
- #type token =
- # PLUS | MINUS | TIMES | DIV | LPAR | RPAR
- #| INT of int;;
- Type token defined.
-
-
-
- 70
-
-
-
- Skipping blank spaces is performed by the spaces function defined as:
-
- #let rec spaces = function
- # [< '` `|`\t`|`\n`; spaces _ >] -> ()
- #| [< >] -> ();;
- spaces : char stream -> unit = <fun>
-
- The conversion of a digit (character) into its integer value is done by:
-
- #let int_of_digit = function
- # `0`..`9` as c -> (int_of_char c) - (int_of_char `0`)
- #| _ -> raise (Failure "not a digit");;
- int_of_digit : char -> int = <fun>
-
- The ``as'' keyword allows for naming a pattern: in this case, the variable
- c will be bound to the actual digit matched by `0`..`9`. Pattern built with
- as are called alias patterns.
- For the recognition of integers, we already feel the need for a
- parameterized parser. Integer recognition is done by the integer analyzer
- defined below. It is parameterized by a numeric value representing the
- value of the first digits of the number:
-
- #let rec integer n = function
- # [< ' `0`..`9` as c; (integer (10*n + int_of_digit c)) r >] -> r
- #| [< >] -> n;;
- integer : int -> char stream -> int = <fun>
-
- #integer 0 (stream_of_string "12345");;
- - : int = 12345
-
- We are now ready to write the lexical analyzer, taking a stream of
- characters, and returning a stream of tokens. Returning a token stream
- which will be explored by the parser is a simple, reasonably efficient and
- intuitive way of composing a lexical analyzer and a parser.
-
- #let rec lexer s = match s with
- # [< '`(`; spaces _ >] -> [< 'LPAR; lexer s >]
- #| [< '`)`; spaces _ >] -> [< 'RPAR; lexer s >]
- #| [< '`+`; spaces _ >] -> [< 'PLUS; lexer s >]
- #| [< '`-`; spaces _ >] -> [< 'MINUS; lexer s >]
- #| [< '`*`; spaces _ >] -> [< 'TIMES; lexer s >]
- #| [< '`/`; spaces _ >] -> [< 'DIV; lexer s >]
- #| [< '`0`..`9` as c; (integer (int_of_digit c)) n; spaces _ >]
- # -> [< 'INT n; lexer s >];;
- lexer : char stream -> token stream = <fun>
-
- We assume there is no leading space in the input.
- Now, let us examine the language that we want to recognize. We shall have
- integers, infix arithmetic operations and parenthesized expressions. The
- BNF form of the grammar is:
-
-
-
- 71
-
-
-
- Expr ::= Expr + Expr
- | Expr - Expr
- | Expr * Expr
- | Expr / Expr
- | ( Expr )
- | INT
-
- The values computed by the parser will be abstract syntax trees (by contrast
- with concrete syntax, which is the input string or stream). Such trees
- belong to the following type:
-
- #type atree =
- # Int of int
- #| Plus of atree * atree
- #| Minus of atree * atree
- #| Mult of atree * atree
- #| Div of atree * atree;;
- Type atree defined.
-
- The Expr grammar is ambiguous. To make it unambiguous, we will adopt the
- usual precedences for arithmetic operators and assume that all operators
- associate to the left. Now, to use stream matching for parsing, we must
- take into account the fact that matching rules are chosen according to the
- behavior of the first component of each matching rule. This is predictive
- parsing, and, using well-known techniques, it is easy to rewrite the grammar
- above in such a way that writing the corresponding predictive parser becomes
- trivial. These techniques are described in [2], and consist in adding a
- non-terminal for each precedence level, and removing left-recursion. We
- obtain:
-
- Expr ::= Mult
- | Mult + Expr
- | Mult - Expr
-
- Mult ::= Atom
- | Atom * Mult
- | Atom / Mult
-
- Atom ::= INT
- | ( Expr )
-
- While removing left-recursion, we forgot about left associativity of
- operators. This is not a problem, as long as we build correct abstract
- trees.
- Since stream matching bases its choices on the first component of stream
- patterns, we cannot see the grammar above as a parser. We need a further
- transformation, factoring common prefixes of grammar rules (left-factor).
- We obtain:
-
- Expr ::= Mult RestExpr
-
-
-
- 72
-
-
-
- RestExpr ::= + Mult RestExpr
- | - Mult RestExpr
- | (* nothing *)
-
- Mult ::= Atom RestMult
-
- RestMult ::= * Atom RestMult
- | / Atom RestMult
- | (* nothing *)
-
- Atom ::= INT
- | ( Expr )
-
- Now, we can see this grammar as a parser (note that the order of rules
- becomes important, and empty productions must appear last). The shape of
- the parser is:
-
- let rec expr =
- let rec restexpr ? = function
- [< 'PLUS; mult ?; restexpr ? >] -> ?
- | [< 'MINUS; mult ?; restexpr ? >] -> ?
- | [< >] -> ?
- in function [< mult e1; restexpr ? >] -> ?
-
- and mult =
- let rec restmult ? = function
- [< 'TIMES; atom ?; restmult ? >] -> ?
- | [< 'DIV; atom ?; restmult ? >] -> ?
- | [< >] -> ?
- in function [< atom e1; restmult ? >] -> ?
-
- and atom = function
- [< 'INT n >] -> Int n
- | [< 'LPAR; expr e; 'RPAR >] -> e
-
- We used question marks where parameters, bindings and results still have to
- appear. Let us consider the expr function: clearly, as soon as e1 is
- recognized, we must be ready to build the leftmost subtree of the result.
- This leftmost subtree is either restricted to e1 itself, in case restexpr
- does not encounter any operator, or it is the tree representing the addition
- (or subtraction) of e1 and the expression immediately following the additive
- operator. Therefore, restexpr must be called with e1 as an intermediate
- result, and accumulate subtrees built from its intermediate result, the tree
- constructor corresponding to the operator and the last expression
- encountered. The body of expr becomes:
-
- let rec expr =
- let rec restexpr e1 = function
- [< 'PLUS; mult e2; restexpr (Plus (e1,e2)) e >] -> e
- | [< 'MINUS; mult e2; restexpr (Minus (e1,e2)) e >] -> e
- | [< >] -> e1
-
-
- 73
-
-
-
- in function [< mult e1; (restexpr e1) e2 >] -> e2
-
- Now, expr recognizes a product e1 (by mult), and applies (restexpr e1) to
- the rest of the stream. According to the additive operator encountered (if
- any), this function will apply mult which will return some e2. Then the
- process continues with Plus(e1,e2) as intermediate result. In the end, a
- correctly balanced tree will be produced (using the last rule of restexpr).
- With the same considerations on mult and restmult, we can complete the
- parser, obtaining:
-
- #let rec expr =
- # let rec restexpr e1 = function
- # [< 'PLUS; mult e2; (restexpr (Plus (e1,e2))) e >] -> e
- # | [< 'MINUS; mult e2; (restexpr (Minus (e1,e2))) e >] -> e
- # | [< >] -> e1
- #in function [< mult e1; (restexpr e1) e2 >] -> e2
- #
- #and mult =
- # let rec restmult e1 = function
- # [< 'TIMES; atom e2; (restmult (Mult (e1,e2))) e >] -> e
- # | [< 'DIV; atom e2; (restmult (Div (e1,e2))) e >] -> e
- # | [< >] -> e1
- #in function [< atom e1; (restmult e1) e2 >] -> e2
- #
- #and atom = function
- # [< 'INT n >] -> Int n
- #| [< 'LPAR; expr e; 'RPAR >] -> e;;
- expr : token stream -> atree = <fun>
- mult : token stream -> atree = <fun>
- atom : token stream -> atree = <fun>
-
- And we can now try our parser:
-
- #expr (lexer (stream_of_string "(1+2+3*4)-567"));;
- - : atree = Minus (Plus (Plus (Int 1, Int 2), Mult (Int 3, Int 4)), Int 567)
-
-
- 10.3.2 Parameters simulating inherited attributes
-
- In the previous example, the parsers restexpr and restmult take an abstract
- syntax tree e1 as argument and pass it down to the result through recursive
- calls such as (restexpr (Plus(e1,e2))). If we see such parsers as
- non-terminals (RestExpr from the grammar above) this parameter acts as an
- inherited attribute of the non-terminal. Synthesized attributes are
- simulated by the right hand sides of stream matching rules.
-
- 10.3.3 Higher-order parsers
-
- In the definition of expr, we may notice that the parsers expr and mult on
- the one hand and restexpr and restmult on the other hand have exactly the
- same structure. To emphasize this similarity, if we define parsers for
-
-
- 74
-
-
-
- additive (resp. multiplicative) operators by:
-
- #let addop = function
- # [< 'PLUS >] -> (function (x,y) -> Plus(x,y))
- #| [< 'MINUS >] -> (function (x,y) -> Minus(x,y))
- #and multop = function
- # [< 'TIMES >] -> (function (x,y) -> Mult(x,y))
- #| [< 'DIV >] -> (function (x,y) -> Div(x,y));;
- addop : token stream -> atree * atree -> atree = <fun>
- multop : token stream -> atree * atree -> atree = <fun>
-
- we can rewrite the expr parser as:
-
- #let rec expr =
- # let rec restexpr e1 = function
- # [< addop f; mult e2; (restexpr (f (e1,e2))) e >] -> e
- # | [< >] -> e1
- #in function [< mult e1; (restexpr e1) e2 >] -> e2
- #
- #and mult =
- # let rec restmult e1 = function
- # [< multop f; atom e2; (restmult (f (e1,e2))) e >] -> e
- # | [< >] -> e1
- #in function [< atom e1; (restmult e1) e2 >] -> e2
- #
- #and atom = function
- # [< 'INT n >] -> Int n
- #| [< 'LPAR; expr e; 'RPAR >] -> e;;
- expr : token stream -> atree = <fun>
- mult : token stream -> atree = <fun>
- atom : token stream -> atree = <fun>
-
- Now, we take advantage of these similarities in order to define a general
- parser for left-associative operators. Its name is left_assoc and is
- parameterized by a parser for operators and a parser for expressions:
-
- #let rec left_assoc op term =
- # let rec rest e1 = function
- # [< op f; term e2; (rest (f (e1,e2))) e >] -> e
- # | [< >] -> e1
- # in function [< term e1; (rest e1) e2 >] -> e2;;
- left_assoc :
- ('a stream -> 'b * 'b -> 'b) -> ('a stream -> 'b) -> 'a stream -
- > 'b = <fun>
-
- Now, we can redefine expr as:
-
- #let rec expr str = left_assoc addop mult str
- #and mult str = left_assoc multop atom str
- #and atom = function
- # [< 'INT n >] -> Int n
-
-
- 75
-
-
-
- #| [< 'LPAR; expr e; 'RPAR >] -> e;;
- expr : token stream -> atree = <fun>
- mult : token stream -> atree = <fun>
- atom : token stream -> atree = <fun>
-
- And we can now try our definitive parser:
-
- #expr (lexer (stream_of_string "(1+2+3*4)-567"));;
- - : atree = Minus (Plus (Plus (Int 1, Int 2), Mult (Int 3, Int 4)), Int 567)
-
- Parameterized parsers are useful for defining general parsers such as
- left_assoc that can be used with different instances. Another example of a
- useful general parser is the star parser defined as:
-
- #let rec star p = function
- # [< p x; (star p) l >] -> x::l
- #| [< >] -> [];;
- star : ('a stream -> 'b) -> 'a stream -> 'b list = <fun>
-
- The star parser iterates zero or more times its argument p and returns the
- list of results. We still have to be careful in using these general parsers
- because of the predictive nature of parsing. For example, star p will never
- successfully terminate if p has a rule for the empty stream pattern: this
- rule will make the second rule of star useless!
-
- 10.3.4 Example: parsing a non context-free language
-
- As an example of parsing with parameterized parsers, we shall build a parser
- for the language {wCw | w#(A|B)*}, which is known to be non context-free.
- First, let us define a type for this alphabet:
-
- #type token = A | B | C;;
- Type token defined.
-
- Given an input of the form wCw, the idea for a parser recognizing it is:
-
- - first, to recognize the sequence w with a parser wd (for word
- definition) returning information in order to build a parser recognizing
- only w;
-
- - then to recognize C;
-
- - and to use the parser built at the first step to recognize the sequence
- w.
-
- The definition of wd is as follows:
-
- #let rec wd = function
- # [< 'A; wd l >] -> (function [< 'A >] -> "a")::l
- #| [< 'B; wd l >] -> (function [< 'B >] -> "b")::l
- #| [< >] -> [];;
-
-
- 76
-
-
-
- wd : token stream -> (token stream -> string) list = <fun>
-
- The wu function (for word usage) builds a parser sequencing a list of
- parsers:
-
- #let rec wu = function
- # p::pl -> (function [< p x; (wu pl) l >] -> x^l)
- #| [] -> (function [< >] -> "");;
- wu : ('a stream -> string) list -> 'a stream -> string = <fun>
-
- The wu function builds, from a list of parsers pi, for i=1..n, a single
- parser
-
- function [<p1 x1;.. .;pn xn>] -> [x1;.. .;xn]
-
- which is the sequencing of parsers pi. The main parser w is:
-
- #let w = function [< wd l; 'C; (wu l) r >] -> r;;
- w : token stream -> string = <fun>
-
- #w [< 'A; 'B; 'B; 'C; 'A; 'B; 'B >];;
- - : string = "abb"
-
- #w [< 'C >];;
- - : string = ""
-
- In the previous parser, we used an intermediate list of parsers in order
- to build the second parser. We can redefine wd without using such a list:
-
- #let w =
- # let rec wd wr = function
- # [< 'A; (wd (function [< wr r; 'A >] -> r^"a")) p >] -> p
- # | [< 'B; (wd (function [< wr r; 'B >] -> r^"b")) p >] -> p
- # | [< >] -> wr
- # in function [< (wd (function [< >] -> "")) p; 'C; p str >] -> str;;
- w : token stream -> string = <fun>
-
- #w [< 'A; 'B; 'B; 'C; 'A; 'B; 'B >];;
- - : string = "abb"
-
- #w [< 'C >];;
- - : string = ""
-
- Here, wd is made local to w, and takes as parameter wr (for word recognizer)
- whose initial value is the parser with an empty stream pattern. This
- parameter accumulates intermediate results, and is delivered at the end of
- parsing the initial sequence w. After checking for the presence of C, it is
- used to parse the second sequence w.
-
-
-
-
-
- 77
-
-
-
- 10.4 Further reading
-
- A summary of the constructs over streams and of primitives over streams is
- given in [20].
- An alternative to parsing with streams and stream matching are the camllex
- and camlyacc programs.
- A detailed presentation of streams and stream matching following
- ``predictive parsing'' semantics can be found in [25], where alternative
- semantics are given with some possible implementations.
-
-
- Exercises
-
- Exercise 10.1 Define a parser for the language of prefix arithmetic
- expressions generated by the grammar:
-
- Expr ::= INT
- | + Expr Expr
- | - Expr Expr
- | * Expr Expr
- | / Expr Expr
-
- Use the lexical analyzer for arithmetic expressions given above. The result
- of the parser must be the integer resulting from the evaluation of the
- arithmetic expression, i.e. its type must be:
-
- token -> int
-
- Exercise 10.2 Enrich the type token above with a constructor IDENT of
- string for identifiers, and enrich the lexical analyzer for it to recognize
- identifiers built from alphabetic letters (upper or lowercase). Length of
- identifiers may be limited.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 78
-
-
-
-
-
-
-
-
-
- Chapter 11
-
-
-
- Standalone programs and separate compilation
-
-
-
- So far, we have used Caml Light in an interactive way. It is also possible
- to program in Caml Light in a batch-oriented way: writing source code in a
- file, having it compiled into an executable program, and executing the
- program outside of the Caml Light environment. Interactive use is great for
- learning the language and quickly testing new functions. Batch use is more
- convenient to develop larger programs, that should be usable without
- knowledge of Caml Light.
- Note for Macintosh users: batch compilation is not available in the
- standalone Caml Light application. It requires the MPW environment (see the
- Caml Light manual).
-
-
- 11.1 Standalone programs
-
- Standalone programs are composed of a sequence of phrases, contained in one
- or several text files. Phrases are the same as at toplevel: expressions,
- value declarations, type declarations, exception declarations, and
- directives. When executing the stand-alone program produced by the
- compiler, all phrases are executed in order. The values of expressions and
- declared global variables are not printed, however. A stand-alone program
- has to perform input and output explicitly.
- Here is a sample program, that prints the number of characters and the
- number of lines of its standard input, like the wc Unix utility.
-
- let chars = ref 0;;
- let lines = ref 0;;
- try
- while true do
- let c = input_char std_in in
- chars := !chars + 1;
- if c = `\n` then lines := !lines + 1 else ()
- done
- with End_of_file ->
- print_int !chars; print_string " characters, ";
- print_int !lines; print_string " lines.\n";
- exit 0
- ;;
-
-
- 79
-
-
-
- The input_char function reads the next character from an input channel
- (here, std_in, the channel connected to standard input). It raises
- exception End_of_file when reaching the end of the file. The exit function
- aborts the process. Its argument is the exit status of the process.
- Calling exit is absolutely necessary to ensure proper flushing of the output
- channels.
- Assume this program is in file count.ml. To compile it, simply run the
- camlc command from the command interpreter:
-
- camlc -o count count.ml
-
- The compiler produces an executable file count. You can now run count with
- the help of the "camlrun" command:
-
- camlrun count < count.ml
-
- This should display something like:
-
- 306 characters, 13 lines.
-
- Under Unix, the count file can actually be executed directly, just like any
- other Unix command, as in:
-
- ./count < count.ml
-
- This also works under MS-DOS, provided the executable file is given
- extension .exe. That is, if we compile count.ml as follows:
-
- camlc -o count.exe count.ml
-
- we can run count.exe directly, as in:
-
- count.exe < count.ml
-
- See the reference manual for more information on camlc.
-
-
- 11.2 Programs in several files
-
- It is possible to split one program into several source files, separately
- compiled. This way, local changes do not imply a full recompilation of the
- program. Let us illustrate that on the previous example. We split it in
- two modules: one that implements integer counters; another that performs
- the actual counting. Here is the first one, counter.ml:
-
- (* counter.ml *)
- type counter = { mutable val: int };;
- let new init = { val = init };;
- let incr c = c.val <- c.val + 1;;
- let read c = c.val;;
-
-
-
- 80
-
-
-
- Here is the source for the main program, in file main.ml.
-
- (* main.ml *)
- let chars = counter__new 0;;
- let lines = counter__new 0;;
- try
- while true do
- let c = input_char std_in in
- counter__incr chars;
- if c = `\n` then counter__incr lines else ()
- done
- with End_of_file ->
- print_int (counter__read chars); print_string " characters, ";
- print_int (counter__read lines); print_string " lines.\n";
- exit 0
- ;;
-
- Notice that references to identifiers defined in module counter.ml are
- prefixed with the name of the module, counter, and by __ (the ``long dash''
- symbol: two underscore characters). If we had simply entered new 0, for
- instance, the compiler would have assumed new is an identifier declared in
- the current module, and issued an ``undefined identifier'' error.
- Compiling this program requires two compilation steps, plus one final
- linking step.
-
- camlc -c counter.ml
- camlc -c main.ml
- camlc -o main counter.zo main.zo
-
- Running the program is done as before:
-
- camlrun main < counter.ml
-
- The -c option to camlc means ``compile only''; that is, the compiler should
- not attempt to produce a stand-alone executable program from the given file,
- but simply an object code file (files counter.zo, main.zo). The final
- linking phases takes the two .zo files and produces the executable main.
- Object files must be linked in the right order: for each global identifier,
- the module defining it must come before the modules that use it.
-
- Prefixing all external identifiers by the name of their defining module is
- sometimes tedious. Therefore, the Caml Light compiler provides a mechanism
- to omit the module__ part in external identifiers. The system maintains a
- list of ``default'' modules, called the currently opened modules, and
- whenever it encounters an identifier without the module__ part, it searches
- through the opened modules, to find one that defines this identifier.
- Searched modules always include the module being compiled (searched first),
- and some library modules of general use. In addition, two directives are
- provided to add and to remove modules from the list of opened modules:
-
- - #open "module";; to add module in front of the list;
-
-
- 81
-
-
-
- - #close "module";; to remove module from the list.
-
- For instance, we can rewrite the main.ml file above as:
-
- #open "counter";;
- let chars = new 0;;
- let lines = new 0;;
- try
- while true do
- let c = input_char std_in in
- incr chars;
- if c = `\n` then incr lines
- done
- with End_of_file ->
- print_int (read chars);
- print_string " characters, ";
- print_int (read lines);
- print_string " lines.\n";
- exit 0
- ;;
-
- After the #open "counter" directive, the identifier new automatically
- resolves to counters__new.
- If two modules, say mod1 and mod2, define both a global value f, then f in
- a client module client resolves to mod1__f if mod1 is opened but not mod2,
- or if mod1 has been opened more recently than mod2. Otherwise, it resolves
- to mod2__f. For instance, the two system modules int and float both define
- the infix identifier +. Both modules int and float are opened by default,
- but int comes first. Hence, x + y is understood as the integer addition,
- since + is resolved to int__+. But after the directive #open "float";;,
- module float comes first, and the identifier + is resolved to float__+.
-
-
- 11.3 Abstraction
-
- Some globals defined in a module are not intended to be used outside of this
- module. Then, it is good programming style not to export them outside of
- the module, so that the compiler can check they are not used in another
- module. Also, one may wish to export a data type abstractly, that is,
- without publicizing the structure of the type. This ensures that other
- modules cannot build or inspect objects of that type without going through
- one of the functions on that type exported in the defining module. This
- helps in writing clean, well-structured programs.
- The way to do that in Caml Light is to write an explicit interface, or
- output signature, specifying those identifiers that are visible from the
- outside. All other identifiers will remain local to the module. For global
- values, their types must be given by hand. The interface is contained in a
- file whose name is the module name, with extension .mli.
- Here is for instance an interface for the counter module, that abstracts
- the type counter:
-
-
-
- 82
-
-
-
- (* counter.mli *)
- type counter;; (* an abstract type *)
- value new : int -> counter
- and incr : counter -> unit
- and read : counter -> int;;
-
- Interfaces must be compiled separately. However, once the interface for
- module A has been compiled, any module B that uses A can be immediately
- compiled, even if the implementation of A is not yet compiled or even not
- yet written. Consider:
-
- camlc -c counter.mli
- camlc -c main.ml
- camlc -c counter.ml
- camlc -o main counter.zo main.zo
-
- The implementation main.ml could be compiled before counter.ml. The only
- requirement for compiling main.ml is the existence of counter.zi, the
- compiled interface of the counter module.
-
-
- Exercises
-
- Exercise 11.1 Complete the count command: it should be able to operate on
- several files, given on the command line. Hint: sys__command_line is an
- array of strings, containing the command-line arguments to the process.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 83
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Part III
-
-
-
- A complete example
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 84
-
-
-
-
-
-
-
-
-
- Chapter 12
-
-
-
- ASL: A Small Language
-
-
-
- We present in this chapter a simple language: ASL (A Small Language). This
- language is basically the l-calculus (the purely functional kernel of Caml)
- enriched with a conditional construct. The conditional must be a special
- construct, because our language will be submitted to call-by-value: thus,
- the conditional cannot be a function.
- ASL programs are built up from numbers, variables, functional expressions
- (l-abstractions), applications and conditionals. An ASL program consists of
- a global declaration of an identifier getting bound to the value of an
- expression. The primitive functions that are available are equality between
- numbers and arithmetic binary operations. The concrete syntax of ASL
- expressions can be described (ambiguously) as:
-
- Expr ::= INT
- | IDENT
- | "if" Expr "then" Expr "else" Expr "fi"
- | "(" Expr ")"
- | "\" IDENT "." Expr
-
- and the syntax of declarations is given as:
-
-
- Decl ::= "let" IDENT "be" Expr ";"
- | Expr ";"
-
-
- Arithmetic binary operations will be written in prefix position and will
- belong to the class IDENT. The \ symbol will play the role of the Caml
- keyword function.
- We start by defining the abstract syntax of ASL expressions and of ASL
- toplevel phrases. Then we define a parser in order to produce abstract
- syntax trees from the concrete syntax of ASL programs.
-
-
- 12.1 ASL abstract syntax trees
-
- We encode variable names by numbers. These numbers represent the binding
- depth of variables. For instance, the function of x returning x (the ASL
-
-
- 85
-
-
-
- identity function) will be represented as:
-
- Abs("x", Var 1)
-
- And the ASL application function which would be written in Caml:
-
- (function f -> (function x -> f(x)))
-
- would be represented as:
-
- Abs("f", Abs("x", App(Var 2, Var 1)))
-
- and should be viewed as the tree:
- Abs
- "f" / \ Abs
- "x" / \ App
- Var 2/\ Var 1
-
- Var n should be read as ``an occurrence of the variable bound by the nth
- abstraction node encountered when going toward the root of the abstract
- syntax tree''. In our example, when going from Var 2 to the root, the 2nd
- abstraction node we encounter introduces the "f" variable.
- The numbers encoding variables in abstract syntax trees of functional
- expressions are called ``De Bruijn(1) numbers''. The characters that we
- attach to abstraction nodes simply serve as documentation: they will not be
- used by any of the semantic analyses that we will perform on the trees. The
- type of ASL abstract syntax trees is defined by:
-
- #type asl = Const of int
- # | Var of int
- # | Cond of asl * asl * asl
- # | App of asl * asl
- # | Abs of string * asl
- #
- #and top_asl = Decl of string * asl;;
- Type asl defined.
- Type top_asl defined.
-
-
-
- 12.2 Parsing ASL programs
-
- Now we come to the problem of defining a concrete syntax for ASL programs
- and declarations.
-
- ------------------------------
- 1. They have been proposed by N.G. De Bruijn in [10] in order to facilitate
- the mechanical treatment of l-calculus terms.
-
-
-
- 86
-
-
-
- The choice of the concrete aspect of the programs is simply a matter of
- taste. The one we choose here is close to the syntax of l-calculus (except
- that we will use the backslash character because there is no ``l'' on our
- keyboards). We will use the curried versions of equality and arithmetic
- functions. We will also use a prefix notation (a la Lisp) for their
- application. We will write ``+ (+ 1 2) 3'' instead of ``(1+2)+3''. The
- ``if e1 then e2 else e3'' construct will be written ``if e1 then e2 else e3
- fi'', and will return the then part when e1 is different from 0 (0 acts
- thus as falsity in ASL conditionals).
-
- 12.2.1 Lexical analysis
-
- The concrete aspect of ASL programs will be either declarations of the form:
-
- let identifier be expression;
-
- or:
-
- expression;
-
- which will be understood as:
-
- let it be expression;
-
- The tokens produced by the lexical analyzer will represent the keywords let,
- be, if and else, the \ binder, the dot, parentheses, integers, identifiers,
- arithmetic operations and terminating semicolons. We reuse here most of the
- code that we developed in chapter 10 or in the answers to its exercises.
- Skipping blank spaces:
-
- #let rec spaces = function
- # [< '` `|`\t`|`\n`; spaces _ >] -> ()
- #| [< >] -> ();;
- spaces : char stream -> unit = <fun>
-
- The type of tokens is given by:
-
- #type token = LET | BE | LAMBDA | DOT | LPAR | RPAR
- # | IF | THEN | ELSE | FI | SEMIC
- # | INT of int | IDENT of string;;
- Type token defined.
-
- Integers:
-
- #let int_of_digit = function
- # `0`..`9` as c -> (int_of_char c) - (int_of_char `0`)
- #| _ -> raise (Failure "not a digit");;
- int_of_digit : char -> int = <fun>
-
- #let rec integer n = function
-
-
- 87
-
-
-
- # [< ' `0`..`9` as c; (integer (10*n + int_of_digit c)) r >] -> r
- #| [< >] -> INT n;;
- integer : int -> char stream -> token = <fun>
-
- We restrict ASL identifiers to be composed of lowercase letters, the eight
- first being significative. An explanation about the ident function can be
- found in the chapter dedicated to the answers to exercises (chapter 17).
- The function given here is slightly different and tests its result in order
- to see wether it is a keyword (let, be, ...) or not:
-
- #let ident_buf = make_string 8 ` `;;
- ident_buf : string = " "
-
- #let rec ident len = function
- # [< ' `a`..`z` as c;
- # (if len >= 8 then ident len
- # else begin
- # set_nth_char ident_buf len c;
- # ident (succ len)
- # end) s >] -> s
- #| [< >] -> (match sub_string ident_buf 0 len
- # with "let" -> LET
- # | "be" -> BE
- # | "if" -> IF
- # | "then" -> THEN
- # | "else" -> ELSE
- # | "fi" -> FI
- # | s -> IDENT s);;
- ident : int -> char stream -> token = <fun>
-
- A reasonable lexical analyzer would use a hash table to recognize keywords
- faster.
- Primitive operations are recognized by the following function, which also
- detects illegal operators and ends of input:
-
- #let oper = function
- # [< '`+`|`-`|`*`|`/`|`=` as c >] -> IDENT(make_string 1 c)
- #| [< 'c >] -> prerr_string "Illegal character: ";
- # prerr_endline (char_for_read c);
- # raise (Failure "ASL parsing")
- #| [< >] -> prerr_endline "Unexpected end of input";
- # raise (Failure "ASL parsing");;
- oper : char stream -> token = <fun>
-
- The lexical analyzer has the same structure as the one given in chapter 10
- except that leading blanks are skipped.
-
- #let rec lexer str = spaces str;
- #match str with
- # [< '`(`; spaces _ >] -> [< 'LPAR; lexer str >]
- #| [< '`)`; spaces _ >] -> [< 'RPAR; lexer str >]
-
-
- 88
-
-
-
- #| [< '`\\`; spaces _ >] -> [< 'LAMBDA; lexer str >]
- #| [< '`.`; spaces _ >] -> [< 'DOT; lexer str >]
- #| [< '`;`; spaces _ >] -> [< 'SEMIC; lexer str >]
- #| [< '`0`..`9` as c;
- # (integer (int_of_digit c)) tok;
- # spaces _ >] -> [< 'tok; lexer str >]
- #| [< '`a`..`z` as c;
- # (set_nth_char ident_buf 0 c; ident 1) tok;
- # spaces _ >] -> [< 'tok; lexer str >]
- #| [< oper tok; spaces _ >] -> [< 'tok; lexer str >]
- #;;
- lexer : char stream -> token stream = <fun>
-
- The lexical analyzer returns a stream of tokens that the parser will receive
- as argument.
-
- 12.2.2 Parsing
-
- The final output of our parser will be abstract syntax trees of type asl or
- top_asl. This implies that we will detect unbound identifiers at
- parse-time. In this case, we will raise the Unbound exception defined as:
-
- #exception Unbound of string;;
- Exception Unbound defined.
-
- We also need a function which will compute the binding depths of variables.
- That function simply looks for the position of the first occurrence of a
- variable name in a list. It will raise Unbound if there is no such
- occurrence.
-
- #let binding_depth s rho =
- # let rec bind n = function
- # [] -> raise (Unbound s)
- # | t::l -> if s = t then Var n else bind (n+1) l
- # in bind 1 rho
- #;;
- binding_depth : string -> string list -> asl = <fun>
-
- We also need a global environment, containing names of already bound
- identifiers. The global environment contains predefined names for the
- equality and arithmetic functions. We represent the global environment as a
- reference since each ASL declaration will augment it with a new name.
-
- #let init_env = ["+";"-";"*";"/";"="];;
- init_env : string list = ["+"; "-"; "*"; "/"; "="]
-
- #let global_env = ref init_env;;
- global_env : string list ref = ref ["+"; "-"; "*"; "/"; "="]
-
- We now give a parsing function for ASL programs. Blanks at the beginning of
- the string are skipped.
-
-
- 89
-
-
-
- #let rec top = function
- # [< 'LET; 'IDENT id; 'BE; expression e; 'SEMIC >] -> Decl(id,e)
- # | [< expression e; 'SEMIC >] -> Decl("it",e)
- #
- #and expression = function
- # [< (expr !global_env) e >] -> e
- #
- #and expr rho =
- # let rec rest e1 = function
- # [< (atom rho) e2; (rest (App(e1,e2))) e >] -> e
- # | [< >] -> e1
- # in function
- # [< 'LAMBDA; 'IDENT id; 'DOT; (expr (id::rho)) e >] -> Abs(id,e)
- # | [< (atom rho) e1; (rest e1) e2 >] -> e2
- #
- #and atom rho = function
- # [< 'IDENT id >] ->
- # (try binding_depth id rho with Unbound s ->
- # print_string "Unbound ASL identifier: ";
- # print_string s; print_newline();
- # raise (Failure "ASL parsing"))
- # | [< 'INT n >] -> Const n
- # | [< 'IF; (expr rho) e1; 'THEN; (expr rho) e2;
- # 'ELSE; (expr rho) e3; 'FI >] -> Cond(e1,e2,e3)
- # | [< 'LPAR; (expr rho) e; 'RPAR >] -> e;;
- top : token stream -> top_asl = <fun>
- expression : token stream -> asl = <fun>
- expr : string list -> token stream -> asl = <fun>
- atom : string list -> token stream -> asl = <fun>
-
- The complete parser that we will use reads a string, converts it into a
- stream, and produces the token stream that is parsed:
-
- #let parse_top s = top(lexer(stream_of_string s));;
- parse_top : string -> top_asl = <fun>
-
- Let us try our grammar (we do not augment the global environment at each
- declaration: this will be performed after the semantic treatment of ASL
- programs). We need to write double \ inside strings, since \ is the string
- escape character.
-
- #parse_top "let f be \\x.x;";;
- - : top_asl = Decl ("f", Abs ("x", Var 1))
-
- #parse_top "let x be + 1 ((\\x.x) 2);";;
- - : top_asl =
- Decl ("x", App (App (Var 1, Const 1), App (Abs ("x", Var 1), Const 2)))
-
- Unbound identifiers and undefined operators are correctly detected:
-
- #parse_top "let y be g 3;";;
-
-
- 90
-
-
-
- Unbound ASL identifier: g
- Uncaught exception: Failure "ASL parsing"
-
- #parse_top "f (if 0 then + else - fi) 2 3;";;
- Unbound ASL identifier: f
- Uncaught exception: Failure "ASL parsing"
-
- #parse_top "^ x y;";;
- Illegal character: ^
- Uncaught exception: Failure "ASL parsing"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 91
-
-
-
-
-
-
-
-
-
- Chapter 13
-
-
-
- Untyped semantics of ASL programs
-
-
-
- In this section, we give a semantic treatment of ASL programs. We will use
- dynamic typechecking, i.e. we will test the type correctness of programs
- during their interpretation.
-
-
- 13.1 Semantic values
-
- We need a type for ASL semantic values (representing results of
- computations). A semantic value will be either an integer, or a Caml
- functional value from ASL values to ASL values.
-
- #type semval = Constval of int
- # | Funval of (semval -> semval);;
- Type semval defined.
-
- We now define two exceptions. The first one will be used when we encounter
- an ill-typed program and will represent run-time type errors. The other one
- is helpful for debugging: it will be raised when our interpreter (semantic
- function) goes into an illegal situation.
- The following two exceptions will be raised in case of run-time ASL type
- error, and in case of bug of our semantic treatment:
-
- #exception Illtyped;;
- Exception Illtyped defined.
-
- #exception SemantBug of string;;
- Exception SemantBug defined.
-
- We must give a semantic value to our basic functions (equality and
- arithmetic operations). The next function transforms a Caml function into
- an ASL value.
-
- #let init_semantics caml_fun =
- # Funval
- # (function Constval n ->
- # Funval(function Constval m -> Constval(caml_fun n m)
- # | _ -> raise Illtyped)
-
-
- 92
-
-
-
- # | _ -> raise Illtyped);;
- init_semantics : (int -> int -> int) -> semval = <fun>
-
- Now, associate a Caml Light function to each ASL predefined function:
-
- #let caml_function = function
- # "+" -> prefix +
- # | "-" -> prefix -
- # | "*" -> prefix *
- # | "/" -> prefix /
- # | "=" -> (fun n m -> if n=m then 1 else 0)
- # | s -> raise (SemantBug "Unknown primitive");;
- caml_function : string -> int -> int -> int = <fun>
-
- In the same way as, for parsing, we needed a global environment from which
- the binding depth of identifiers was computed, we need a semantic
- environment from which the interpreter will fetch the value represented by
- identifiers. The global semantic environment will be a reference on the
- list of predefined ASL values.
-
- #let init_sem = map (fun x -> init_semantics(caml_function x))
- # init_env;;
- init_sem : semval list =
- [Funval <fun>; Funval <fun>; Funval <fun>; Funval <fun>; Funval <fun>]
-
- #let global_sem = ref init_sem;;
- global_sem : semval list ref =
- ref [Funval <fun>; Funval <fun>; Funval <fun>; Funval <fun>; Funval <fun>]
-
-
-
- 13.2 Semantic functions
-
- The semantic function is the interpreter itself. There is one for
- expressions and one for declarations. The one for expressions computes the
- value of an ASL expression from an environment rho. The environment will
- contain the values of globally defined ASL values or of temporary ASL
- values. It is organized as a list, and the numbers representing variable
- occurrences will be used as indices into the environment.
-
- #let rec nth n = function
- # [] -> raise (Failure "nth")
- # | x::l -> if n=1 then x else nth (n-1) l;;
- nth : int -> 'a list -> 'a = <fun>
-
- #let rec semant rho =
- # let rec sem = function
- # Const n -> Constval n
- # | Var(n) -> nth n rho
- # | Cond(e1,e2,e3) ->
- # (match sem e1 with Constval 0 -> sem e3
-
-
- 93
-
-
-
- # | Constval n -> sem e2
- # | _ -> raise Illtyped)
- # | Abs(_,e') -> Funval(fun x -> semant (x::rho) e')
- # | App(e1,e2) -> (match sem e1
- # with Funval(f) -> f (sem e2)
- # | _ -> raise Illtyped)
- # in sem
- #;;
- semant : semval list -> asl -> semval = <fun>
-
- The main function must be able to treat an ASL declaration, evaluate it, and
- update the global environments (global_env and global_sem).
-
- #let semantics = function Decl(s,e) ->
- # let result = semant !global_sem e
- # in global_env := s::!global_env;
- # global_sem := result::!global_sem;
- # print_string "ASL Value of ";
- # print_string s;
- # print_string " is ";
- # (match result with
- # Constval n -> print_int n
- # | Funval f -> print_string "<fun>");
- # print_newline();;
- semantics : top_asl -> unit = <fun>
-
-
-
- 13.3 Examples
-
- #semantics (parse_top "let f be \\x. + x 1;");;
- ASL Value of f is <fun>
- - : unit = ()
-
- #semantics (parse_top "let i be \\x. x;");;
- ASL Value of i is <fun>
- - : unit = ()
-
- #semantics (parse_top "let x be i (f 2);");;
- ASL Value of x is 3
- - : unit = ()
-
- #semantics (parse_top "let y be if x then (\\x.x) else 2 fi 0;");;
- ASL Value of y is 0
- - : unit = ()
-
-
-
-
-
-
-
-
- 94
-
-
-
-
-
-
-
-
-
- Chapter 14
-
-
-
- Encoding recursion
-
-
-
-
- 14.1 Fixpoint combinators
-
- We have seen that we do not have recursion in ASL. However, it is possible
- to encode recursion by defining a fixpoint combinator. A fixpoint
- combinator is a function F such that:
-
- F M is equivalent to M (F M) modulo the evaluation rules.
-
- for any expression M. A consequence of the equivalence given above is that
- fixpoint combinators can encode recursion. Let us note M=N if expressions
- M and N are equivalent modulo the evaluation rules. Then, consider ffact to
- be the functional obtained from the body of the factorial function by
- abstracting (i.e. using as a parameter) the fact identifier, and fix an
- arbitrary fixpoint combinator. We have:
-
- ffact is \fact.(\n. if = n 0 then 1 else * n (fact (- n 1)) fi)
-
- Now, let us consider the expression E=(fix ffact) 3. Using our intuition
- about the evaluation rules, and the definition of a fixpoint combinator, we
- obtain:
- E= ffact (fix ffact) 3
- Replacing ffact by its definition, we obtain:
- E= (\fact.(\n. if = n 0 then 1 else * n (fact (- n 1)) fi)) (fix
- ffact) 3
- We can now pass the two arguments to the first abstraction, instantiating
- fact and n respectively to fix ffact and 3:
- E= if = 3 0 then 1 else * 3 (fix ffact (- 3 1)) fi
- We can now reduce the conditional into its else branch:
- E= * 3 (fix ffact (- 3 1))
- Continuing this way, we eventually compute:
- E= * 3 (* 2 (* 1 1)) = 6
- This is the expected behavior of the factorial function. Given an
- appropriate fixpoint combinator fix, we could define the factorial function
- as fix ffact, where ffact is the expression above.
-
-
-
-
- 95
-
-
-
- Unfortunately, when using call-by-value, any application of a fixpoint
- combinator F such that:
-
- F M evaluates to M (F M)
-
- leads to non-termination of the evaluation (because evaluation of (F M)
- leads to evaluating (M (F M)), and thus (F M) again).
- We will use the Z fixpoint combinator defined by:
-
- Z=lf.((lx. f (ly. (x x) y))(lx. f (ly. (x x) y)))
-
- The fixpoint combinator Z has the particularity of being usable under
- call-by-value evaluation regime (in order to check that fact, it is
- necessary to know the evaluation rules of l-calculus). Since the name z
- looks more like an ordinary parameter name, we will call fix the ASL
- expression corresponding to the Z fixpoint combinator.
-
- #semantics (parse_top
- # "let fix be \\f.((\\x.f(\\y.(x x) y))(\\x.f(\\y.(x x) y)));");;
- ASL Value of fix is <fun>
- - : unit = ()
-
- We are now able to define the ASL factorial function:
-
- #semantics (parse_top
- # "let fact be fix (\\f.(\\n. if = n 0 then 1
- # else * n (f (- n 1)) fi));");;
- ASL Value of fact is <fun>
- - : unit = ()
-
- #semantics (parse_top "fact 8;");;
- ASL Value of it is 40320
- - : unit = ()
-
- and the ASL Fibonacci function:
-
- #semantics (parse_top
- # "let fib be fix (\\f.(\\n. if = n 1 then 1
- # else if = n 2 then 1
- # else + (f (- n 1)) (f (-
- n 2)) fi fi));");;
- ASL Value of fib is <fun>
- - : unit = ()
-
- #semantics (parse_top "fib 9;");;
- ASL Value of it is 34
- - : unit = ()
-
-
-
-
-
-
- 96
-
-
-
- 14.2 Recursion as a primitive construct
-
- Of course, in a more realistic prototype, we would extend the concrete and
- abstract syntaxes of ASL in order to support recursion as a primitive
- construct. We do not do it here because we want to keep ASL simple. This
- is an interesting non trivial exercise!
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 97
-
-
-
-
-
-
-
-
-
- Chapter 15
-
-
-
- Static typing, polymorphism and type synthesis
-
-
-
- We now want to perform static typechecking of ASL programs, that is, to
- complete typechecking before evaluation, making run-time type tests
- unnecessary during evaluation of ASL programs.
- Furthermore, we want to have polymorphism (i.e. allow the identity
- function, for example, to be applicable to any kind of data).
- Type synthesis may be seen as a game. When learning a game, we must:
-
- - learn the rules (what is allowed, and what is forbidden);
-
- - learn a winning strategy.
-
- In type synthesis, the rules of the game are called a type system, and the
- winning strategy is the typechecking algorithm.
- In the following sections, we give the ASL type system, the algorithm and
- an implementation of that algorithm. Most of this presentation is borrowed
- from [7].
-
-
- 15.1 The type system
-
- We study in this section a type system for the ASL language. Then, we
- present an algorithm performing the type synthesis of ASL programs, and its
- Caml Light implementation. Because of subtle aspects of the notation used
- (inference rules), and since some important mathematical notions, such as
- unification of first-order terms, are not presented here, this chapter may
- seem obscure at first reading.
- The type system we will consider for ASL has been first given by Milner
- [27] for a subset of the ML language (in fact, a superset of l-calculus). A
- type is either:
-
- - the type Number;
-
- - or a type variable (a, b, ...);
-
- - or t1->t2, where t1 and t2 are types.
-
- In a type, a type variable is an unknown, i.e. a type that we are
-
-
- 98
-
-
-
- computing. We will use t, t', t1, ..., as metavariables(1) representing
- types. This notation is important: we shall use other greek letters to
- denote other notions in the following sections.
-
- Example (a->Number)->b->b is a type.
- O
- A type scheme, is a type where some variables are distinguished as being
- generic. We can represent type schemes by:
-
- for all a1,.. .,an.t where t is a type.
-
- Example
-
- for all a.(a->Number)->b->b and (a->Number)->b->b are type schemes.
- O We will use s, s', s1, ..., as metavariables representing type schemes.
- # #
- We may also write type schemes as for all a.t. In this case, a represent a
- (possibly empty) set of generic type variables. When the set of generic
- variables is empty, we write for all .t or simply t.
- We will write FV(s) for the set of unknowns occurring in the type scheme
- s. Unknowns are also called free variables (they are not bound by a
- for all quantifier).
- We also write BV(s) (bound type variables of s) for the set of type
- variables occurring in s which are not free (i.e. the set of variables
- universally quantified). Bound type variables are also said to be generic.
-
- Example If s denotes the type scheme for all a.(a->Number)->b->b, then we
- have:
-
- FV(s)= {b}
-
- and
-
- BV(s)= {a}
-
- O A substitution instance s' of a type scheme s is the type scheme S(s)
- where S is a substitution of types for free type variables appearing in s.
- When applying a substitution to a type scheme, a renaming of some bound type
- variables may become necessary, in order to avoid the capture of a free type
- variable by a quantifier.
-
- Example
-
- - If s denotes for all b.(b->a)->a and s' is
- for all b.(b->(g->g))->(g->g), then s' is a substitution instance of s
-
- ------------------------------
- 1. A metavariable should not be confused with a variable or a type
- variable.
-
-
-
- 99
-
-
-
- because s'= S(s) where S={a<-(g->g)}, i.e. S substitutes the type g->g
- for the variable a.
-
- - If s denotes for all b.(b->a)->a and s' is
- for all d.(d->(b->b))->(b->b), then s' is a substitution instance of s
- because s'= S(s) where S={a<-(b->b)}. In this case, the renaming of b
- into d was necessary: we did not want the variable b introduced by S to
- be captured by the universal quantification for all b.
-
- O The type scheme s'=for all b1 ... bm.t' is said to be a generic instance
- of s=for all a1 ... an.t if there exists a substitution S such that:
-
- - the domain of S is included in {a1,... ,an};
-
- - t'= S(t);
-
- - no bi occurs free in s.
-
- In other words, a generic instance of a type scheme is obtained by giving
- more precise values to some generic variables, and (possibly) quantifying
- some of the new type variables introduced.
-
- Example If s=for all b.(b->a)->a, then s' =for all g.((g->g)->a)->a is a
- generic instance of s. We changed b into (g->g), and we universally
- quantified on the newly introduced type variable g.
- O
- We express this type system by means of inference rules. An inference
- rule is written as a fraction:
-
- - the numerator is called the premisses;
-
- - the denominator is called the conclusion.
-
- An inference rule:
-
- P1 ... Pn
- -----------
- C
-
- may be read in two ways:
-
- - ``If P1, ...and Pn , then C''.
-
- - ``In order to prove C, it is sufficient to prove P1, ...and Pn ''.
-
- An inference rule may have no premise: such a rule will be called an axiom.
-
-
-
-
-
-
- 100
-
-
-
- A complete proof will be represented by a proof tree of the following form:
-
- Pm ... ... Pk
- 1 l
- ---- ----
-
- .
- .
- .
- ------... ------
-
- P1 ... P1
- 1 n
- ----------------
-
- C
-
- where the leaves of the tree (Pm, ..., Pk) are instances of axioms.
- 1 l
- In the premisses and the conclusions appear judgements having the form:
-
- G |- e :s
-
- Such a judgement should be read as ``under the typing environment G, the
- expression e has type scheme s''. Typing environments are sets of typing
- hypotheses of the form x:s where x is an identifier name and s is a type
- scheme: typing environments give types to the variables occurring free
- (i.e. unbound) in the expression.
- When typing l-calculus terms, the typing environment is managed as a stack
- (because identifiers possess local scopes). We represent that fact in the
- presentation of the type system by removing the typing hypothesis concerning
- an identifier name x (if such a typing hypothesis exists) before adding a
- new typing hypothesis concerning x.
- We write G-G(x) for the set ot typing hypotheses obtained from G by
- removing the typing hypothesis concerning x (if it exists).
- Any numeric constant is of type Number:
-
- -------------------- (NUM)
- G |- Const n :Number
-
- We obtain type schemes for variables from the typing environment G:
-
- ---------------------- (TAUT)
- G U {x: s} |- Var x: s
-
- It is possible to instantiate type schemes. The ``GenInstance'' relation
- represents generic instantiation.
-
- G |- e :s s' =GenInstance(s)
- ----------------------------- (INST)
- G |- e:s'
-
- It is possible to generalize type schemes with respect to variables that do
- not occur free in the set of hypotheses:
-
- G |- e:s a not in FV(G)
- ------------------------- (GEN)
- G |- e:for all a.s
-
-
-
- 101
-
-
-
- Typing a conditional:
-
- G |- e1 :Number G |- e2 :t G |- e3 :t
- --------------------------------------- (IF)
- G |- (if e1 then e2 else e3 fi): t
-
- Typing an application:
-
- G |- e1 :t->t' G |- e2 :t
- --------------------------- (APP)
- G |- (e1 e2) :t'
-
- Typing an abstraction:
-
- (G- G(x)) U {x:t} |- e :t'
- --------------------------- (ABS)
- G |- (lx .e):t->t'
-
- The special rule below is the one that introduces polymorphism: this
- corresponds to the ML let construct.
-
- G |- e :s (G -G(x)) U {x: s} |- e': t
- -------------------------------------- (LET)
- G |- (lx .e') e :t
-
- This type system has been proven to be semantically sound, i.e. the
- semantic value of a well-typed expression (an expression admitting a type)
- cannot be an error value due to a type error. This is usually expressed as:
-
- Well-typed programs cannot go wrong.
-
- This fact implies that a clever compiler may produce code without any
- dynamic type test for a well-typed expression.
-
- Example Let us check, using the set of rules above, that the following is
- true:
-
- 0 |- let f =lx.x in f f :b->b
-
- In order to do so, we will use the equivalence between the let construct and
- an application of an immediate abstraction (i.e. an expression having the
- following shape: (lv.M)N. The (LET) rule will be crucial: without it, we
- could not check the judgement above.
-
- --------------------------- (TAU---------------------- (TAU---------------------- (TAUT)
- {x: a} |- x: a G |- f :for all a.a->a G |- f: for all a.a->a
- --------------------------- (ABS---------------------- (INS---------------------- (INST)
- 0 |- (lx.x) :a->a G |- f :(b->b)->(b->b) G |- f: b->b
- --------------------------- (GE---------------------------------------------------(APP)
- 0 |- (lx.x) :for all a.a->a G ={f :for all a.a->a} |- f f :b->b
- ----------------------------------------------------------------------------------- (LET)
- 0 |- let f =lx.x in f f :b->b
-
-
- O
-
-
- 102
-
-
-
- This type system does not tell us how to find the best type for an
- expression. But what is the best type for an expression? It must be such
- that any other possible type for that expression is more specific; in other
- words, the best type is the most general.
-
-
- 15.2 The algorithm
-
- How do we find the most general type for an expression of our language? The
- problem with the set of rules above, is that we could instantiate and
- generalize types at any time, introducing type schemes, while the most
- important rules (application and abstraction) used only types.
- Let us write a new set of inference rules that we will read as an
- algorithm (close to a Prolog program):
- Any numeric constant is of type Number:
-
- -------------------- (NUM)
- G |- Const n :Number
-
- The types of identifiers are obtained by taking generic instances of type
- schemes appearing in the typing environment. These generic instances will
- be types and not type schemes: this restriction appears in the rule below,
- where the type t is expected to be a generic instance of the type scheme s.
- As it is presented (belonging to a deduction system), the following rule
- will have to anticipate the effect of the equality constraints between types
- in the other rules (multiple occurrences of a type metavariable), when
- choosing the instance t.
-
- t =GenInstance(s)
- ---------------------- (INST)
- G U {x: s} |- Var x: t
-
- When we read this set of inference rules as an algorithm, the (INST) rule
- will be implemented by:
-
- 1. taking as t the ``most general generic instance'' of s that is a type
- (the rule requires t to be a type and not a type scheme),
-
- 2. making t more specific by unification [32] when encountering equality
- constraints.
-
- Typing a conditional requires only the test part to be of type Number, and
- both alternatives to be of the same type t. This is an example of equality
- constraint between the types of two expressions.
-
- G |- e1 :Number G |- e2 :t G |- e3 :t
- --------------------------------------- (COND)
- G |- (if e1 then e2 else e3 fi): t
-
- Typing an application produces also equality constraints that are to be
-
-
-
-
-
-
- 103
-
-
-
- solved by unification:
-
- G |- e1 :t->t' G |- e2 :t
- --------------------------- (APP)
- G |- (e1 e2) :t'
-
- Typing an abstraction ``pushes'' a typing hypothesis for the abstracted
- identifier: unification will make it more precise during the typing of the
- abstraction body:
-
- (G -G(x)) U {x: for all .t} |- e: t'
- ------------------------------------ (ABS)
- G |- (lx .e):t->t'
-
- Typing a let construct involves a generalization step: we generalize as
- much as possible.
-
- G |- e: t' {a ,.. .,a } =FV(t') -FV(G) (G- G(x)) U {x: for all a ... a .t'} |- e': t
- 1 n 1 n
- -------------------------------------------------------------------------------------- (LET)
- G |- (lx .e') e: t
-
- This set of inference rules represents an algorithm because there is
- exactly one conclusion for each syntactic ASL construct (giving priority to
- the (LET) rule over the regular application rule). This set of rules may be
- read as a Prolog program.
- This algorithm has been proven to be:
-
- - syntactically sound: when the algorithm succeeds on an expression e and
- returns a type t, then e:t.
-
- - complete: if an expression e possesses a type t, then the algorithm
- will find a type t' such that t is an instance of t'. The returned
- type t' is thus the most general type of e.
-
- Example
-
- We compute the type that we simply checked in our last example. What is
- drawn below is the result of the type synthesis: in fact, we run our
- algorithm with type variables representing unknowns, modifying the previous
- applications of the (INST) rule when necessary (i.e. when encountering an
- equality constraint). This is valid, since it can be proved that the
- correction of the whole deduction tree is preserved by substitution of types
- for type variables. In a real implementation of the algorithm, the data
- structures representing types will be submitted to a unification mechanism.
-
- ----------------- (INS---------------------- (INS------------(INST)
- {x :a} |- x :a G |- f: (b->b)->(b->b) G |- f:b->b
- ----------------- (AB-----------------------------------------(APP)
- 0 |- (lx.x): a->a G= {f: for all a.a->a} |- f f: b->b
- --------------------------------------------------------------- (LET)
- 0 |- let f =lx.x in f f :b->b
-
-
-
-
- 104
-
-
-
- Once again, this expression is not typable without the use of the (LET)
- rule: an error would occur because of the type equality constraints between
- all occurrences of a variable bound by a ``l''. In an effective
- implementation, a unification error would occur.
- O
- We may notice from the example above that the algorithm is
- syntax-directed: since, for a given expression, a type deduction for that
- expression uses exactly one rule per sub-expression, the deduction possesses
- the same structure as the expression. We can thus reconstruct the ASL
- expression from its type deduction tree. From the deduction tree above, if
- we write upper rules as being ``arguments'' of the ones below and if we
- annotate the applications of the (INST) and (ABS) rules by the name of the
- subject variable, we obtain:
-
- LETf(ABSx(INSTx), APP(INSTf, INSTf))
-
- This is an illustration of the ``types-as-propositions and
- programs-as-proofs'' paradigm, also known as the ``Curry-Howard
- isomorphism'' (cf. [14]). In this example, we can see the type of the
- considered expression as a proposition and the expression itself as the
- proof, and, indeed, we recognize the expression as the deduction tree.
-
-
- 15.3 The ASL type-synthesizer
-
- We now implement the set of inference rules given above.
- We need:
-
- - a Caml representation of ASL types and type schemes,
-
- - a management of type environments,
-
- - a unification procedure,
-
- - a typing algorithm.
-
- 15.3.1 Representation of ASL types and type schemes
-
- We first need to define a Caml type for our ASL type data structure:
-
- #type asl_type = Unknown
- # | Number
- # | TypeVar of vartype
- # | Arrow of asl_type * asl_type
- #and vartype = {Index:int; mutable Value:asl_type}
- #and asl_type_scheme = Forall of int list * asl_type ;;
- Type asl_type defined.
- Type vartype defined.
- Type asl_type_scheme defined.
-
- The Unknown ASL type is not really a type: it is the initial value of fresh
-
-
- 105
-
-
-
- ASL type variables. We will consider as abnormal a situation where Unknown
- appears in place of a regular ASL type. In such situations, we will raise
- the following exception:
-
- #exception TypingBug of string;;
- Exception TypingBug defined.
-
- Type variables are allocated by the new_vartype function, and their global
- counter (a local reference) is reset by reset_vartypes.
-
- #let new_vartype, reset_vartypes =
- #(* Generating and resetting unknowns *)
- # let counter = ref 0
- # in (function () -> counter := !counter + 1;
- # {Index = !counter; Value = Unknown}),
- # (function () -> counter := 0);;
- new_vartype : unit -> vartype = <fun>
- reset_vartypes : unit -> unit = <fun>
-
-
- 15.3.2 Destructive unification of ASL types
-
- We will need to ``shorten'' type variables: since they are indirections to
- ASL types, we need to follow these indirections in order to obtain the type
- that they represent. For the sake of efficiency, we take advantage of this
- operation to replace multiple indirections by single indirections
- (shortening).
-
- #let rec shorten t =
- # match t with
- # TypeVar {Index=_; Value=Unknown} -> t
- # | TypeVar ({Index=_;
- # Value=TypeVar {Index=_;
- # Value=Unknown} as tv}) -> tv
- # | TypeVar ({Index=_; Value=TypeVar tv1} as tv2)
- # -> tv2.Value <- tv1.Value; shorten t
- # | TypeVar {Index=_; Value=t'} -> t'
- # | Unknown -> raise (TypingBug "shorten")
- # | t' -> t';;
- shorten : asl_type -> asl_type = <fun>
-
- An ASL type error will be represented by the following exception:
-
- #exception TypeClash of asl_type * asl_type;;
- Exception TypeClash defined.
-
- We will need unification on ASL types with occur-check. The following
- function implements occur-check:
-
- #let occurs {Index=n;Value=_} =
- # let rec occrec = function
-
-
- 106
-
-
-
- # TypeVar{Index=m;Value=_} -> (n=m)
- # | Number -> false
- # | Arrow(t1,t2) -> (occrec t1) or (occrec t2)
- # | Unknown -> raise (TypingBug "occurs")
- # in occrec
- #;;
- occurs : vartype -> asl_type -> bool = <fun>
-
- The unification function: implements destructive unification. Instead of
- returning the most general unifier, it returns the unificand of two types
- (their most general common instance). The two arguments are physically
- modified in order to represent the same type. The unification function will
- detect type clashes.
-
- #let rec unify (tau1,tau2) =
- # match (shorten tau1, shorten tau2)
- # with (* type variable n and type variable m *)
- # (TypeVar({Index=n; Value=Unknown} as tv1) as t1),
- # (TypeVar({Index=m; Value=Unknown} as tv2) as t2)
- # -> if n <> m then tv1.Value <- t2
- # | (* type t1 and type variable *)
- # t1, (TypeVar ({Index=_;Value=Unknown} as tv) as t2)
- # -> if not(occurs tv t1) then tv.Value <- t1
- # else raise (TypeClash (t1,t2))
- # | (* type variable and type t2 *)
- # (TypeVar ({Index=_;Value=Unknown} as tv) as t1), t2
- # -> if not(occurs tv t2) then tv.Value <- t2
- # else raise (TypeClash (t1,t2))
- # | Number, Number -> ()
- # | Arrow(t1,t2), (Arrow(t'1,t'2) as t)
- # -> unify(t1,t'1); unify(t2,t'2)
- # | (t1,t2) -> raise (TypeClash (t1,t2));;
- unify : asl_type * asl_type -> unit = <fun>
-
-
- 15.3.3 Representation of typing environments
-
- We use asl_type_scheme list as typing environments, and we will use the
- encoding of variables as indices into the environment.
- The initial environment is a list of types (Number -> (Number -> Number)),
- which are the types of the ASL primitive functions.
-
- #let init_typing_env =
- # map (function s ->
- # Forall([],Arrow(Number,
- # Arrow(Number,Number))))
- # init_env;;
-
- The global typing environment is initialized to the initial typing
- environment, and will be updated with the type of each ASL declaration,
- after they are type-checked.
-
-
- 107
-
-
-
- #let global_typing_env = ref init_typing_env;;
-
-
- 15.3.4 From types to type schemes: generalization
-
- In order to implement generalization, we will need some functions collecting
- types variables occurring in ASL types.
- The following function computes the list of type variables of its
- argument.
-
- #let vars_of_type tau =
- # let rec vars vs = function
- # Number -> vs
- # | TypeVar {Index=n; Value=Unknown}
- # -> if mem n vs then vs else n::vs
- # | TypeVar {Index=_; Value= t} -> vars vs t
- # | Arrow(t1,t2) -> vars (vars vs t1) t2
- # | Unknown -> raise (TypingBug "vars_of_type")
- # in vars [] tau
- #;;
- vars_of_type : asl_type -> int list = <fun>
-
- The unknowns_of_type(bv,t) application returns the list of variables
- occurring in t that do not appear in bv. The subtract function returns the
- difference of two lists.
-
- #let unknowns_of_type (bv,t) =
- # subtract (vars_of_type t) bv;;
- unknowns_of_type : int list * asl_type -> int list = <fun>
-
- We need to compute the list of unknowns of a type environment for the
- generalization process (unknowns belonging to that list cannot become
- generic). The set of unknowns of a type environment is the union of the
- unknowns of each type. The flat function flattens a list of lists.
-
- #let flat = it_list (prefix @) [];;
- flat : '_a list list -> '_a list = <fun>
-
- #let unknowns_of_type_env env =
- # flat (map (function Forall(gv,t) -> unknowns_of_type (gv,t)) env);;
- unknowns_of_type_env : asl_type_scheme list -> int list = <fun>
-
- The generalization of a type is relative to a typing environment. The
- make_set function eliminates duplicates in its list argument.
-
- #let rec make_set = function
- # [] -> []
- # | x::l -> if mem x l then make_set l else x :: make_set l;;
- make_set : 'a list -> 'a list = <fun>
-
- #let generalise_type (gamma, tau) =
-
-
- 108
-
-
-
- # let genvars =
- # make_set (subtract (vars_of_type tau)
- # (unknowns_of_type_env gamma))
- # in Forall(genvars, tau)
- #;;
- generalise_type : asl_type_scheme list * asl_type -> asl_type_scheme = <fun>
-
-
- 15.3.5 From type schemes to types: generic instantiation
-
- The following function returns a generic instance of its type scheme
- argument. A generic instance is obtained by replacing all generic type
- variables by new unknowns:
-
- #let gen_instance (Forall(gv,tau)) =
- # (* We associate a new unknown to each generic variable *)
- # let unknowns = map (function n -> n, TypeVar(new_vartype())) gv in
- # let rec ginstance = function
- # (TypeVar {Index=n; Value=Unknown} as t) ->
- # (try assoc n unknowns
- # with Not_found -> t)
- # | TypeVar {Index=_; Value= t} -> ginstance t
- # | Number -> Number
- # | Arrow(t1,t2) -> Arrow(ginstance t1, ginstance t2)
- # | Unknown -> raise (TypingBug "gen_instance")
- # in ginstance tau
- #;;
- gen_instance : asl_type_scheme -> asl_type = <fun>
-
-
- 15.3.6 The ASL type synthesizer
-
- The type synthesizer is the asl_typing_expr function. Each of its match
- cases corresponds to an inference rule given above.
-
- #let rec asl_typing_expr gamma =
- # let rec type_rec = function
- # Const _ -> Number
- # | Var n ->
- # let sigma =
- # try nth n gamma
- # with Failure _ -> raise (TypingBug "Unbound")
- # in gen_instance sigma
- # | Cond (e1,e2,e3) ->
- # unify(Number, type_rec e1);
- # let t2 = type_rec e2 and t3 = type_rec e3
- # in unify(t2, t3); t3
- # | App((Abs(x,e2) as f), e1) -> (* LET case *)
- # let t1 = type_rec e1 in
- # let sigma = generalise_type (gamma,t1)
- # in asl_typing_expr (sigma::gamma) e2
-
-
- 109
-
-
-
- # | App(e1,e2) ->
- # let u = TypeVar(new_vartype())
- # in unify(type_rec e1,Arrow(type_rec e2,u)); u
- # | Abs(x,e) ->
- # let u = TypeVar(new_vartype()) in
- # let s = Forall([],u)
- # in Arrow(u,asl_typing_expr (s::gamma) e)
- # in type_rec;;
- asl_typing_expr : asl_type_scheme list -> asl -> asl_type = <fun>
-
-
- 15.3.7 Typing, trapping type clashes and printing ASL types
-
- Now, we define some auxiliary functions in order to build a ``good-looking''
- type synthesizer. First of all, a printing routine for ASL type schemes is
- defined (using a function tvar_name which computes a decent name for type
- variables).
-
- #let tvar_name n =
- # (* Computes a name "'a", ... for type variables, *)
- # (* given an integer n representing the position *)
- # (* of the type variable in the list of generic *)
- # (* type variables *)
- # let rec name_of n =
- # let q,r = (n / 26), (n mod 26) in
- # let s = make_string 1 (char_of_int (96+r)) in
- # if q=0 then s else (name_of q)^s
- # in "'"^(name_of n)
- #;;
- tvar_name : int -> string = <fun>
-
- Then a printing function for type schemes.
-
- #let print_type_scheme (Forall(gv,t)) =
- # (* Prints a type scheme. *)
- # (* Fails when it encounters an unknown *)
- # let names = let rec names_of = function
- # (n,[]) -> []
- # | (n,(v1::Lv)) -> (tvar_name n)::(names_of (n+1, Lv))
- # in names_of (1,gv) in
- # let tvar_names = combine (rev gv, names) in
- # let rec print_rec = function
- # TypeVar{Index=n; Value=Unknown} ->
- # let name = try assoc n tvar_names
- # with Not_found ->
- # raise (TypingBug "Non generic variable")
- # in print_string name
- # | TypeVar{Index=_;Value=t} -> print_rec t
- # | Number -> print_string "Number"
- # | Arrow(t1,t2) ->
- # print_string "("; print_rec t1;
-
-
- 110
-
-
-
- # print_string " -> "; print_rec t2;
- # print_string ")"
- # | Unknown -> raise (TypingBug "print_type_scheme")
- # in print_rec t
- #;;
- Toplevel input:
- > | (n,(v1::Lv)) -> (tvar_name n)::(names_of (n+1, Lv))
- > ^^
- Warning: the variable Lv starts with an upper case letter in this pattern.
- print_type_scheme : asl_type_scheme -> unit = <fun>
-
- Now, the main function which resets the type variables counter, calls the
- type synthesizer, traps ASL type clashes and prints the resulting types. At
- the end, the global environments are updated.
-
- #let typing (Decl(s,e)) =
- # reset_vartypes();
- # let tau = (* TYPING *)
- # try asl_typing_expr !global_typing_env e
- # with TypeClash(t1,t2) -> (* A typing error *)
- # let vars=vars_of_type(t1)@vars_of_type(t2) in
- # print_string "ASL Type clash between ";
- # print_type_scheme (Forall(vars,t1));
- # print_string " and ";
- # print_type_scheme (Forall(vars,t2));
- # print_newline();
- # raise (Failure "ASL typing") in
- # let sigma = generalise_type (!global_typing_env,tau) in
- # (* UPDATING ENVIRONMENTS *)
- # global_env := s::!global_env;
- # global_typing_env := sigma::!global_typing_env;
- # reset_vartypes ();
- # (* PRINTING RESULTING TYPE *)
- # print_string "ASL Type of ";
- # print_string s;
- # print_string " is ";
- # print_type_scheme sigma; print_newline();;
- typing : top_asl -> unit = <fun>
-
-
- 15.3.8 Typing ASL programs
-
- We reinitialize the parsing environment:
-
- #global_env:=init_env; ();;
- - : unit = ()
-
- Now, let us run some examples through the ASL type checker:
-
- #typing (parse_top "let x be 1;");;
- ASL Type of x is Number
-
-
- 111
-
-
-
- - : unit = ()
-
- #typing (parse_top "+ 2 ((\\x.x) 3);");;
- ASL Type of it is Number
- - : unit = ()
-
- #typing (parse_top "if + 0 1 then 1 else 0 fi;");;
- ASL Type of it is Number
- - : unit = ()
-
- #typing (parse_top "let id be \\x.x;");;
- ASL Type of id is ('a -> 'a)
- - : unit = ()
-
- #typing (parse_top "+ (id 1) (id id 2);");;
- ASL Type of it is Number
- - : unit = ()
-
- #typing (parse_top "let f be (\\x.x x) (\\x.x);");;
- ASL Type of f is ('a -> 'a)
- - : unit = ()
-
- #typing (parse_top "+ (\\x.x) 1;");;
- ASL Type clash between Number and ('a -> 'a)
- Uncaught exception: Failure "ASL typing"
-
-
- 15.3.9 Typing and recursion
-
- The Z fixpoint combinator does not have a type in Milner's type system:
-
- #typing (parse_top
- # "let fix be \\f.((\\x.f(\\z.(x x)z)) (\\x.f(\\z.(x x)z)));");;
- ASL Type clash between 'a and ('a -> 'b)
- Uncaught exception: Failure "ASL typing"
-
- This is because we try to apply x to itself, and the type of x is not
- polymorphic. In fact, no fixpoint combinator is typable in ASL. This is why
- we need a special primitive or syntactic construct in order to express
- recursivity.
- If we want to assign types to recursive programs, we have to predefine the
- Z fixpoint combinator. Its type scheme should be for all a.((a->a)->a),
- because we take fixpoints of functions.
-
- #global_env := "fix"::init_env;
- #global_typing_env:=
- # (Forall([1],
- # Arrow(Arrow(TypeVar{Index=1;Value=Unknown},
- # TypeVar{Index=1;Value=Unknown}),
- # TypeVar{Index=1;Value=Unknown})))
- # ::init_typing_env;
-
-
- 112
-
-
-
- #();;
- - : unit = ()
-
- We can now define our favorite functions as:
-
- #typing (parse_top
- # "let fact be fix (\\f.(\\n. if = n 0 then 1
- # else * n (f (- n 1))
- # fi));");;
- ASL Type of fact is (Number -> Number)
- - : unit = ()
-
- #typing (parse_top "fact 8;");;
- ASL Type of it is Number
- - : unit = ()
-
- #typing (parse_top
- # "let fib be fix (\\f.(\\n. if = n 1 then 1
- # else if = n 2 then 1
- # else + (f(- n 1)) (f(- n 2))
- # fi
- # fi));");;
- ASL Type of fib is (Number -> Number)
- - : unit = ()
-
- #typing (parse_top "fib 9;");;
- ASL Type of it is Number
- - : unit = ()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 113
-
-
-
-
-
-
-
-
-
- Chapter 16
-
-
-
- Compiling ASL to an abstract machine code
-
-
-
- In order to fully take advantage of the static typing of ASL programs, we
- have to:
-
- - either write a new interpreter without type tests (difficult, because we
- used pattern-matching in order to realize type tests);
-
- - or design an untyped machine and produce code for it.
-
- We choose here the second solution: it will permit us to give some
- intuition about the compiling process of functional languages, and to
- describe a typical execution model for (strict) functional languages. The
- machine that we will use is a simplified version the Categorical Abstract
- Machine (CAM, for short).
- We will call CAM our abstract machine, despite its differences with the
- original CAM. For more informations on the CAM, see [8, 26].
-
-
- 16.1 The Abstract Machine
-
- The execution model is a stack machine (i.e. a machine using a stack). In
- this section, we define in Caml the states of the CAM and its instructions.
- A state is composed of:
-
- - a register (holding values and environments),
-
- - a program counter, represented here as a list of instructions whose
- first element is the current instruction being executed,
-
- - and a stack represented as a list of code addresses (instruction lists),
- values and environments.
-
- The first Caml type that we need is the type for CAM instructions. We will
- study later the effect of each instruction.
-
- #type instruction =
- # Quote of int (* Integer constants *)
- #| Plus | Minus (* Arithmetic operations *)
-
-
- 114
-
-
-
- #| Divide | Equal | Times
- #| Nth of int (* Variable accesses *)
- #| Branch of instruction list * instruction list
- # (* Conditional execution *)
- #| Push (* Pushes onto the stack *)
- #| Swap (* Exch. top of stack and register *)
- #| Clos of instruction list (* Builds a closure with the current environ-
- ment *)
- #| Apply (* Function application *)
- #;;
- Type instruction defined.
-
- We need a new type for semantic values since instruction lists have now
- replaced abstract syntax trees. The semantic values are merged in a type
- object. The type object behaves as data in a computer memory: we need
- higher-level information (such as type information) in order to interpret
- them. Furthermore, some data do not correspond to anything (for example an
- environment composed of environments represents neither an ASL value nor an
- intermediate data in a legal computation process).
-
- #type object = Constant of int
- # | Closure of object * object
- # | Address of instruction list
- # | Environment of object list
- #;;
- Type object defined.
-
- The type state is a product type with mutable components.
-
- #type state = {mutable Reg: object;
- # mutable PC: instruction list;
- # mutable Stack: object list}
- #;;
- Type state defined.
-
- Now, we give the operational semantics of CAM instructions. The effect of
- an instruction is to change the state configuration. This is what we
- describe now with the step function. Code executions will be arbitrary
- iterations of this function.
-
- #exception CAMbug of string;;
- Exception CAMbug defined.
-
- #exception CAM_End of object;;
- Exception CAM_End defined.
-
- #let step state = match state with
- # {Reg=_; PC=Quote(n)::code; Stack=s} ->
- # state.Reg <- Constant(n); state.PC <- code
- #
- #| {Reg=Constant(m); PC=Plus::code; Stack=Constant(n)::s} ->
-
-
- 115
-
-
-
- # state.Reg <- Constant(n+m); state.Stack <- s;
- # state.PC <- code
- #
- #| {Reg=Constant(m); PC=Minus::code; Stack=Constant(n)::s} ->
- # state.Reg <- Constant(n-m); state.Stack <- s;
- # state.PC <- code
- #
- #| {Reg=Constant(m); PC=Times::code; Stack=Constant(n)::s} ->
- # state.Reg <- Constant(n*m); state.Stack <- s;
- # state.PC <- code
- #
- #| {Reg=Constant(m); PC=Divide::code; Stack=Constant(n)::s} ->
- # state.Reg <- Constant(n/m); state.Stack <- s;
- # state.PC <- code
- #
- #| {Reg=Constant(m); PC=Equal::code; Stack=Constant(n)::s} ->
- # state.Reg <- Constant(if n=m then 1 else 0);
- # state.Stack <- s; state.PC <- code
- #
- #| {Reg=Constant(m); PC=Branch(code1,code2)::code; Stack=r::s} ->
- # state.Reg <- r;
- # state.Stack <- Address(code)::s;
- # state.PC <- (if m=0 then code2 else code1)
- #
- #| {Reg=r; PC=Push::code; Stack=s} ->
- # state.Stack <- r::s; state.PC <- code
- #
- #| {Reg=r1; PC=Swap::code; Stack=r2::s} ->
- # state.Reg <- r2; state.Stack <- r1::s;
- # state.PC <- code
- #
- #| {Reg=r; PC=Clos(code1)::code; Stack=s} ->
- # state.Reg <- Closure(Address(code1),r);
- # state.PC <- code
- #
- #| {Reg=_; PC=[]; Stack=Address(code)::s} ->
- # state.Stack <- s; state.PC <- code
- #
- #| {Reg=v; PC=Apply::code;
- # Stack=Closure(Address(code1),Environment(e))::s} ->
- # state.Reg <- Environment(v::e);
- # state.Stack <- Address(code)::s;
- # state.PC <- code1
- #
- #| {Reg=v; PC=[]; Stack=[]} ->
- # raise (CAM_End v)
- #| {Reg=_; PC=(Plus|Minus|Times|Divide|Equal)::code; Stack=_::_} ->
- # raise (CAMbug "IllTyped")
- #
- #| {Reg=Environment(e); PC=Nth(n)::code; Stack=_} ->
- # state.Reg <- (try nth n e
-
-
- 116
-
-
-
- # with Failure _ -> raise (CAMbug "IllTyped"));
- # state.PC <- code
- #| _ -> raise (CAMbug "Wrong configuration")
- #;;
- step : state -> unit = <fun>
-
- We may notice that the empty code sequence denotes a (possibly final) return
- instruction.
- We could argue that pattern-matching in the Camlstep function implements a
- kind of dynamic typechecking. In fact, in a concrete (low-level)
- implementation of the machine (expansion of the CAM instructions in assembly
- code, for example), these tests would not appear. They are useless since we
- trust the typechecker and the compiler. So, any execution error in a real
- implementation comes from a bug in one of the above processes and would lead
- to memory errors or illegal instructions (usually detected by the computer's
- operating system).
-
-
- 16.2 Compiling ASL programs into CAM code
-
- We give in this section a compiling function taking the abstract syntax tree
- of an ASL expression and producing CAM code. The compilation scheme is very
- simple:
-
- - the code of a constant is Quote;
-
- - a variable is compiled as an access to the appropriate component of the
- current environment (Nth);
-
- - the code of a conditional expression will save the current environment
- (Push), evaluate the condition part, and, according to the boolean value
- obtained, select the appropriate code to execute (Branch);
-
- - the code of an application will also save the environment on the stack
- (Push), execute the function part of the application, then exchange the
- functional value and the saved environment (Swap), evaluate the argument
- and, finally, apply the functional value (which is at the top of the
- stack) to the argument held in the register with the Apply instruction;
-
- - the code of an abstraction simply consists in building a closure
- representing the functional value: the closure is composed of the code
- of the function and the current environment.
-
- Here is the compiling function:
-
- #let rec code_of = function
- # Const(n) -> [Quote(n)]
- #| Var n -> [Nth(n)]
- #| Cond(e_test,e_t,e_f) ->
- # Push::(code_of e_test)
- # @[Branch(code_of e_t, code_of e_f)]
-
-
- 117
-
-
-
- #| App(e1,e2) -> Push::(code_of e1)
- # @[Swap]@(code_of e2)
- # @[Apply]
- #| Abs(_,e) -> [Clos(code_of e)];;
- code_of : asl -> instruction list = <fun>
-
- A global environment is needed in order to maintain already defined values.
- Any CAM execution will start in a state whose register part contains this
- global environment.
-
- #let init_CAM_env =
- # let basic_instruction = function
- # "+" -> Plus
- # | "-" -> Minus
- # | "*" -> Times
- # | "/" -> Divide
- # | "=" -> Equal
- # | s -> raise (CAMbug "Unknown primitive")
- # in map (function s ->
- # Closure(Address[Clos(Push::Nth(2)
- # ::Swap::Nth(1)
- # ::[basic_instruction s])],
- # Environment []))
- # init_env;;
- init_CAM_env : object list =
- [Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Plus]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Minus]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Times]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Divide]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Equal]], Environment [])]
-
- #let global_CAM_env = ref init_CAM_env;;
- global_CAM_env : object list ref =
- ref
- [Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Plus]], Environ-
- ment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Minus]], Environ-
- ment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Times]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Divide]], Environment []);
- Closure (Address [Clos [Push; Nth 2; Swap; Nth 1; Equal]], Environment [])]
-
- As an example, here is the code for some ASL expressions.
-
- #code_of (expression(lexer(stream_of_string "1;")));;
- - : instruction list = [Quote 1]
-
- #code_of (expression(lexer(stream_of_string "+ 1 2;")));;
- - : instruction list =
- [Push; Push; Nth 6; Swap; Quote 1; Apply; Swap; Quote 2; Apply]
-
-
-
- 118
-
-
-
- #code_of (expression(lexer(stream_of_string "(\\x.x) ((\\x.x) 0);")));;
- - : instruction list =
- [Push; Clos [Nth 1]; Swap; Push; Clos [Nth 1]; Swap; Quote 0; Apply; Apply]
-
- #code_of (expression(lexer(stream_of_string
- # "+ 1 (if 0 then 2 else 3 fi);")));;
- - : instruction list =
- [Push; Push; Nth 6; Swap; Quote 1; Apply; Swap; Push; Quote 0;
- Branch ([Quote 2], [Quote 3]); Apply]
-
-
-
- 16.3 Execution of CAM code
-
- The main function for executing compiled ASL manages the global environment
- until execution has succeeded.
-
- #let run (Decl(s,e)) =
- # (* TYPING *)
- # reset_vartypes();
- # let tau =
- # try asl_typing_expr !global_typing_env e
- # with TypeClash(t1,t2) ->
- # let vars=vars_of_type(t1) @ vars_of_type(t2) in
- # print_string "ASL Type clash between ";
- # print_type_scheme (Forall(vars,t1));
- # print_string " and ";
- # print_type_scheme (Forall(vars,t2));
- # raise (Failure "ASL typing")
- # | Unbound s -> raise (TypingBug ("Unbound: "^s)) in
- # let sigma = generalise_type (!global_typing_env,tau) in
- # (* PRINTING TYPE INFORMATION *)
- # print_string "ASL Type of ";
- # print_string s; print_string " is ";
- # print_type_scheme sigma; print_newline();
- # (* COMPILING *)
- # let code = code_of e in
- # let state = {Reg=Environment(!global_CAM_env); PC=code; Stack=[]} in
- # (* EXECUTING *)
- # let result = try while true do step state done; state.Reg
- # with CAM_End v -> v in
- # (* UPDATING ENVIRONMENTS *)
- # global_env := s::!global_env;
- # global_typing_env := sigma::!global_typing_env;
- # global_CAM_env := result::!global_CAM_env;
- # (* PRINTING RESULT *)
- # (match result
- # with Constant(n) -> print_int n
- # | Closure(_,_) -> print_string "<funval>"
- # | _ -> raise (CAMbug "Wrong state configuration"));
- # print_newline();;
-
-
- 119
-
-
-
- run : top_asl -> unit = <fun>
-
- Now, let us run some examples:
-
- #(* Reinitializing environments *)
- #global_env:=init_env;
- #global_typing_env:=init_typing_env;
- #global_CAM_env:=init_CAM_env;
- #();;
- - : unit = ()
-
- #run (parse_top "1;");;
- ASL Type of it is Number
- 1
- - : unit = ()
-
- #run (parse_top "+ 1 2;");;
- ASL Type of it is Number
- 3
- - : unit = ()
-
- #run (parse_top "(\\f.(\\x.f x)) (\\x. + x 1) 3;");;
- ASL Type of it is Number
- 4
- - : unit = ()
-
- We may now introduce the Z fixpoint combinator as a predefined function fix.
-
- #begin
- # global_env:="fix"::init_env;
- # global_typing_env:=
- # (Forall([1],
- # Arrow(Arrow(TypeVar{Index=1;Value=Unknown},
- # TypeVar{Index=1;Value=Unknown}),
- # TypeVar{Index=1;Value=Unknown})))
- # ::init_typing_env;
- # global_CAM_env:=
- # (match code_of (expression(lexer(stream_of_string
- # "\\f.((\\x.f(\\z.(x x)z)) (\\x.f(\\z.(x x)z)));")))
- # with [Clos(C)] -> Closure(Address(C), Environment [])
- # | _ -> raise (CAMbug "Wrong code for fix"))
- # ::init_CAM_env
- #end;;
- Toplevel input:
- > with [Clos(C)] -> Closure(Address(C), Environment [])
- > ^
- Warning: the variable C starts with an upper case letter in this pattern.
- - : unit = ()
-
-
- #run (parse_top
-
-
- 120
-
-
-
- # "let fact be fix (\\f.(\\n. if = n 0 then 1
- # else * n (f (- n 1))
- # fi));");;
- ASL Type of fact is (Number -> Number)
- <funval>
- - : unit = ()
-
- #run (parse_top
- # "let fib be fix (\\f.(\\n. if = n 1 then 1
- # else if = n 2 then 1
- # else + (f(- n 1)) (f(- n 2))
- # fi
- # fi));");;
- ASL Type of fib is (Number -> Number)
- <funval>
- - : unit = ()
-
- #run (parse_top "fact 8;");;
- ASL Type of it is Number
- 40320
- - : unit = ()
-
- #run (parse_top "fib 9;");;
- ASL Type of it is Number
- 34
- - : unit = ()
-
- It is of course possible (and desirable) to introduce recursion by using a
- specific syntactic construct, special instructions and a dedicated case to
- the compiling function. See [26] for efficient compilation of recursion,
- data structures etc.
-
- Exercise 16.1 Interesting exercises for which we won't give solutions
- consist in enriching according to your taste the ASL language. Also,
- building a standalone ASL interpreter is a good exercise in modular
- programming.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 121
-
-
-
-
-
-
-
-
-
- Chapter 17
-
-
-
- Answers to exercises
-
-
-
- We give in this chapter one possible solution for each exercise contained in
- this document. Exercises are referred to by their number and the page where
- they have been proposed: for example, ``2.1, p. 15'' refers to the first
- exercise in chapter 2; this exercise is located on page 15.
-
- 3.1, p. 17
-
- The following (anonymous) functions have the required types:
- #function f -> (f 2)+1;;
- - : (int -> int) -> int = <fun>
-
- #function m -> (function n -> n+m+1);;
- - : int -> int -> int = <fun>
-
- #(function f -> (function m -> f(m+1) / 2));;
- - : (int -> int) -> int -> int = <fun>
-
-
- 3.2, p. 17
-
- We must first rename y to z, obtaining:
-
- (function x -> (function z -> x+z))
-
- and finally:
-
- (function y -> (function z -> y+z))
-
- Without renaming, we would have obtained:
-
- (function y -> (function y -> y+y))
-
- which does not denote the same function.
-
-
-
-
-
- 122
-
-
-
- 3.3, p. 17
-
- We write successively the reduction steps for each expressions, and then we
- use Caml in order to check the result.
-
- - let x=1+2 in ((function y -> y+x) x);;
- (function y -> y+(1+2)) (1+2);;
- (function y -> y+(1+2)) 3;;
- 3+(1+2);;
- 3+3;;
- 6;;
-
- Caml says:
-
- #let x=1+2 in ((function y -> y+x) x);;
- - : int = 6
-
-
-
- - let x=1+2 in ((function x -> x+x) x);;
- (function x -> x+x) (1+2);;
- 3+3;;
- 6;;
-
- Caml says:
-
- #let x=1+2 in ((function x -> x+x) x);;
- - : int = 6
-
-
-
- - let f1 = function f2 -> (function x -> f2 x)
- in let g = function x -> x+1
- in f1 g 2;;
- let g = function x -> x+1
- in function f2 -> (function x -> f2 x) g 2;;
- (function f2 -> (function x -> f2 x)) (function x -> x+1) 2;;
- (function x -> (function x -> x+1) x) 2;;
- (function x -> x+1) 2;;
- 2+1;;
- 3;;
-
- Caml says:
-
- #let f1 = function f2 -> (function x -> f2 x)
- #in let g = function x -> x+1
- # in f1 g 2;;
- - : int = 3
-
-
-
-
-
- 123
-
-
-
- 4.1, p. 29
-
- To compute the surface area of a rectangle and the volume of a sphere:
-
- #let surface_rect len wid = len * wid;;
- surface_rect : int -> int -> int = <fun>
-
- #let pi = 4.0 *. atan 1.0;;
- pi : float = 3.14159265359
-
- #let volume_sphere r = 4.0 /. 3.0 *. pi *. (power r 3.);;
- volume_sphere : float -> float = <fun>
-
-
- 4.2, p. 29
-
- In a call-by-value language without conditional construct (and without sum
- types), all programs involving a recursive definition never terminate.
-
- 4.3, p. 29
-
- #let rec factorial n = if n=1 then 1 else n*(factorial(n-1));;
- factorial : int -> int = <fun>
-
- #factorial 5;;
- - : int = 120
-
- #let tail_recursive_factorial n =
- # let rec fact n m = if n=1 then m else fact (n-1) (n*m)
- # in fact n 1;;
- tail_recursive_factorial : int -> int = <fun>
-
- #tail_recursive_factorial 5;;
- - : int = 120
-
-
- 4.4, p. 29
-
- #let rec fibonacci n =
- # if n=1 then 1
- # else if n=2 then 1
- # else fibonacci(n-1) + fibonacci(n-2);;
- fibonacci : int -> int = <fun>
-
- #fibonacci 20;;
- - : int = 6765
-
-
-
-
-
-
- 124
-
-
-
- 4.5, p. 29
-
- #let compose f g = function x -> f (g (x));;
- compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
-
- #let curry f = function x -> (function y -> f(x,y));;
- curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c = <fun>
-
- #let uncurry f = function (x,y) -> f x y;;
- uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c = <fun>
-
- #uncurry compose;;
- - : ('_a -> '_b) * ('_c -> '_a) -> '_c -> '_b = <fun>
-
- #compose curry uncurry;;
- - : ('_a -> '_b -> '_c) -> '_a -> '_b -> '_c = <fun>
-
- #compose uncurry curry;;
- - : ('_a * '_b -> '_c) -> '_a * '_b -> '_c = <fun>
-
-
- 5.1, p. 34
-
- #let rec combine =
- # function [],[] -> []
- # | (x1::l1),(x2::l2) -> (x1,x2)::(combine(l1,l2))
- # | _ -> raise (Failure "combine: lists of different length");;
- combine : 'a list * 'b list -> ('a * 'b) list = <fun>
-
- #combine ([1;2;3],["a";"b";"c"]);;
- - : (int * string) list = [1, "a"; 2, "b"; 3, "c"]
-
- #combine ([1;2;3],["a";"b"]);;
- Uncaught exception: Failure "combine: lists of different length"
-
-
- 5.2, p. 35
-
- #let rec sublists =
- # function [] -> [[]]
- # | x::l -> let sl = sublists l
- # in sl @ (map (fun l -> x::l) sl);;
- sublists : 'a list -> 'a list list = <fun>
-
- #sublists [];;
- - : '_a list list = [[]]
-
- #sublists [1;2;3];;
- - : int list list = [[]; [3]; [2]; [2; 3]; [1]; [1; 3]; [1; 2]; [1; 2; 3]]
-
- #sublists ["a"];;
-
-
- 125
-
-
-
- - : string list list = [[]; ["a"]]
-
-
- 6.1, p. 45
-
- #type ('a,'b) btree = Leaf of 'b
- # | Btree of ('a,'b) node
- #and ('a,'b) node = {Op:'a;
- # Son1: ('a,'b) btree;
- # Son2: ('a,'b) btree};;
- Type btree defined.
- Type node defined.
-
- #let rec nodes_and_leaves =
- # function Leaf x -> ([],[x])
- # | Btree {Op=x; Son1=s1; Son2=s2} ->
- # let (nodes1,leaves1) = nodes_and_leaves s1
- # and (nodes2,leaves2) = nodes_and_leaves s2
- # in (x::nodes1@nodes2, leaves1@leaves2);;
- nodes_and_leaves : ('a, 'b) btree -> 'a list * 'b list = <fun>
-
- #nodes_and_leaves (Btree {Op="+"; Son1=Leaf 1; Son2=Leaf 2});;
- - : string list * int list = ["+"], [1; 2]
-
-
- 6.2, p. 45
-
- #let rec map_btree f g = function
- # Leaf x -> Leaf (f x)
- # | Btree {Op=op; Son1=s1; Son2=s2}
- # -> Btree {Op=g op; Son1=map_btree f g s1;
- # Son2=map_btree f g s2};;
- map_btree : ('a -> 'b) -> ('c -> 'd) -> ('c, 'a) btree -> ('d, 'b) btree =
- <fun>
-
-
- 6.3, p. 45
-
- We need to give a functional interpretation to btree data constructors. We
- use f (resp. g) to denote the function associated to the Leaf (resp.
- Btree) data constructor, obtaining the following Caml definition:
-
- #let rec btree_it f g = function
- # Leaf x -> f x
- # | Btree{Op=op; Son1=s1; Son2=s2}
- # -> g op (btree_it f g s1) (btree_it f g s2)
- #;;
- btree_it : ('a -> 'b) -> ('c -> 'b -> 'b -> 'b) -> ('c, 'a) btree -> 'b =
- <fun>
-
-
-
- 126
-
-
-
- #btree_it (function x -> x)
- # (function "+" -> prefix +
- # | _ -> raise (Failure "Unknown op"))
- # (Btree {Op="+"; Son1=Leaf 1; Son2=Leaf 2});;
- - : int = 3
-
-
- 7.1, p. 53
-
- #type ('a,'b) lisp_cons = {mutable Car:'a; mutable Cdr:'b};;
- Type lisp_cons defined.
-
- #let car p = p.Car
- #and cdr p = p.Cdr
- #and rplaca p v = p.Car <- v
- #and rplacd p v = p.Cdr <- v;;
- car : ('a, 'b) lisp_cons -> 'a = <fun>
- cdr : ('a, 'b) lisp_cons -> 'b = <fun>
- rplaca : ('a, 'b) lisp_cons -> 'a -> unit = <fun>
- rplacd : ('a, 'b) lisp_cons -> 'b -> unit = <fun>
-
- #let p = {Car=1; Cdr=true};;
- p : (int, bool) lisp_cons = {Car=1; Cdr=true}
-
- #rplaca p 2;;
- - : unit = ()
-
- #p;;
- - : (int, bool) lisp_cons = {Car=2; Cdr=true}
-
-
- 7.2, p. 54
-
- #let stamp_counter = ref 0;;
- stamp_counter : int ref = ref 0
-
- #let stamp () =
- # stamp_counter := 1 + !stamp_counter; !stamp_counter;;
- stamp : unit -> int = <fun>
-
- #stamp();;
- - : int = 1
-
- #stamp();;
- - : int = 2
-
-
- 7.3, p. 54
-
- #let exchange t i j =
-
-
- 127
-
-
-
- # let v = t.(i) in vect_assign t i t.(j); vect_assign t j v
- #;;
- exchange : 'a vect -> int -> int -> unit = <fun>
-
- #let quick_sort t =
- # let rec quick lo hi =
- # if lo < hi
- # then begin
- # let i = ref lo
- # and j = ref hi
- # and p = t.(hi) in
- # while !i < !j
- # do
- # while !i < hi & t.(!i) <=. p do incr i done;
- # while !j > lo & p <=. t.(!j) do decr j done;
- # if !i < !j then exchange t !i !j
- # done;
- # exchange t hi !i;
- # quick lo (!i - 1);
- # quick (!i + 1) hi
- # end
- # else ()
- # in quick 0 (vect_length t - 1)
- #;;
- quick_sort : float vect -> unit = <fun>
-
- #let a = [| 2.0; 1.5; 4.0; 0.0; 10.0; 1.0 |];;
- a : float vect = [|2.0; 1.5; 4.0; 0.0; 10.0; 1.0|]
-
- #quick_sort a;;
- - : unit = ()
-
- #a;;
- - : float vect = [|0.0; 1.0; 1.5; 2.0; 4.0; 10.0|]
-
-
- 8.1, p. 58
-
- #let rec find_succeed f = function
- # [] -> raise (Failure "find_succeed")
- # | x::l -> try f x; x with _ -> find_succeed f l
- #;;
- find_succeed : ('a -> 'b) -> 'a list -> 'a = <fun>
-
- #let hd = function [] -> raise (Failure "empty") | x::l -> x;;
- hd : 'a list -> 'a = <fun>
-
- #find_succeed hd [[];[];[1;2];[3;4]];;
- - : int list = [1; 2]
-
-
-
-
- 128
-
-
-
- 8.2, p. 58
-
- #let rec map_succeed f = function
- # [] -> []
- # | h::t -> try (f h)::(map_succeed f t)
- # with _ -> map_succeed f t;;
- map_succeed : ('a -> 'b) -> 'a list -> 'b list = <fun>
-
- #map_succeed hd [[];[1];[2;3];[4;5;6]];;
- - : int list = [1; 2; 4]
-
-
- 9.1, p. 65
-
- The first function (copy) that we define assumes that its arguments are
- respectively the input and the output channel. They are assumed to be
- already opened.
-
- #let copy inch outch =
- # (* inch and outch are supposed to be opened channels *)
- # try (* actual copying *)
- # while true
- # do output_char outch (input_char inch)
- # done
- # with End_of_file -> (* Normal termination *)
- # raise End_of_file
- # | sys__Sys_error msg -> (* Abnormal termination *)
- # prerr_endline msg;
- # raise (Failure "cp")
- # | _ -> (* Unknow exception, maybe interruption? *)
- # prerr_endline "Unknown error while copying";
- # raise (Failure "cp")
- #;;
- copy : in_channel -> out_channel -> unit = <fun>
-
- The next function opens channels connected to its filename arguments, and
- calls copy on these channels. The advantage of dividing the code into two
- functions is that copy performs the actual work, and can be reused in
- different applications, while the role of cp is more ``administrative'' in
- the sense that it does nothing but opening and closing channels and printing
- possible error messages.
-
- #let cp f1 f2 =
- # (* Opening channels, f1 first, then f2 *)
- # let inch =
- # try open_in f1
- # with sys__Sys_error msg ->
- # prerr_endline (f1^": "^msg);
- # raise (Failure "cp")
- # | _ -> prerr_endline ("Unknown exception while opening "^f1);
- # raise (Failure "cp")
-
-
- 129
-
-
-
- # in
- # let outch =
- # try open_out f2
- # with sys__Sys_error msg ->
- # close_in inch;
- # prerr_endline (f2^": "^msg);
- # raise (Failure "cp")
- # | _ -> close_in inch;
- # prerr_endline ("Unknown exception while opening "^f2);
- # raise (Failure "cp")
- # in (* Copying and then closing *)
- # try copy inch outch
- # with End_of_file -> close_in inch; close_out outch
- # (* close_out flushes *)
- # | exc -> close_in inch; close_out outch; raise exc
- #;;
- cp : string -> string -> unit = <fun>
-
- Let us try cp:
-
- #cp "/etc/passwd" "/tmp/foo";;
- - : unit = ()
-
- #cp "/tmp/foo" "/foo";;
- /foo: /foo: Permission denied
- Uncaught exception: Failure "cp"
-
- The last example failed because a regular user is not allowed to write at
- the root of the file system.
-
- 9.2, p. 65
-
- As in the previous exercise, the function count performs the actual
- counting. It works on an input channel and returns a pair of integers.
-
- #let count inch =
- # let chars = ref 0
- # and lines = ref 0 in
- # try
- # while true do
- # let c = input_char inch in
- # chars := !chars + 1;
- # if c = `\n` then lines := !lines + 1 else ()
- # done;
- # (!chars, !lines)
- # with End_of_file -> (!chars, !lines)
- #;;
- count : in_channel -> int * int = <fun>
-
- The function wc opens a channel on its filename argument, calls count and
- prints the result.
-
-
- 130
-
-
-
- #let wc f =
- # let inch =
- # try open_in f
- # with sys__Sys_error msg ->
- # prerr_endline (f^": "^msg);
- # raise (Failure "wc")
- # | _ -> prerr_endline ("Unknown exception while opening "^f);
- # raise (Failure "wc")
- # in let (chars,lines) = count inch
- # in print_int chars;
- # print_string " characters, ";
- # print_int lines;
- # print_string " lines.\n"
- #;;
- wc : string -> unit = <fun>
-
- Counting /etc/passwd gives:
-
- #wc "/etc/passwd";;
- 32474 characters, 392 lines.
- - : unit = ()
-
-
- 10.1, p. 78
-
- Let us recall the definitions of the type token and of the lexical analyzer:
-
- #type token =
- # PLUS | MINUS | TIMES | DIV | LPAR | RPAR
- #| INT of int;;
- Type token defined.
-
- #(* Spaces *)
- #let rec spaces = function
- # [< '` `|`\t`|`\n`; spaces _ >] -> ()
- #| [< >] -> ();;
- spaces : char stream -> unit = <fun>
-
- #(* Integers *)
- #let int_of_digit = function
- # `0`..`9` as c -> (int_of_char c) - (int_of_char `0`)
- #| _ -> raise (Failure "not a digit");;
- int_of_digit : char -> int = <fun>
-
- #let rec integer n = function
- # [< ' `0`..`9` as c; (integer (10*n + int_of_digit c)) r >] -> r
- #| [< >] -> n;;
- integer : int -> char stream -> int = <fun>
-
- #(* The lexical analyzer *)
- #let rec lexer s = match s with
-
-
- 131
-
-
-
- # [< '`(`; spaces _ >] -> [< 'LPAR; lexer s >]
- #| [< '`)`; spaces _ >] -> [< 'RPAR; lexer s >]
- #| [< '`+`; spaces _ >] -> [< 'PLUS; lexer s >]
- #| [< '`-`; spaces _ >] -> [< 'MINUS; lexer s >]
- #| [< '`*`; spaces _ >] -> [< 'TIMES; lexer s >]
- #| [< '`/`; spaces _ >] -> [< 'DIV; lexer s >]
- #| [< '`0`..`9` as c; (integer (int_of_digit c)) n; spaces _ >]
- # -> [< 'INT n; lexer s >];;
- lexer : char stream -> token stream = <fun>
-
- The parser has the same shape as the grammar:
-
- #let rec expr = function
- # [< 'INT n >] -> n
- #| [< 'PLUS; expr e1; expr e2 >] -> e1+e2
- #| [< 'MINUS; expr e1; expr e2 >] -> e1-e2
- #| [< 'TIMES; expr e1; expr e2 >] -> e1*e2
- #| [< 'DIV; expr e1; expr e2 >] -> e1/e2;;
- expr : token stream -> int = <fun>
-
- #expr (lexer (stream_of_string "1"));;
- - : int = 1
-
- #expr (lexer (stream_of_string "+ 1 * 2 4"));;
- - : int = 9
-
-
- 10.2, p. 78
-
- The only new function that we need is a function taking as argument a
- character stream, and returning the first identifier of that stream. It
- could be written as:
-
- #let ident_buf = make_string 8 ` `;;
- ident_buf : string = " "
-
- #let rec ident len = function
- # [< ' `a`..`z`|`A`..`Z` as c;
- # (if len >= 8 then ident len
- # else begin
- # set_nth_char ident_buf len c;
- # ident (succ len)
- # end) s >] -> s
- #| [< >] -> sub_string ident_buf 0 len;;
- ident : int -> char stream -> string = <fun>
-
- The lexical analyzer will first try to recognize an alphabetic character c,
- then put c at position 0 of ident_buf, and call ident 1 on the rest of the
- character stream. Alphabetic characters encountered will be stored in the
- string buffer ident_buf, up to the 8th. Further alphabetic characters will
- be skipped. Finally, a substring of the buffer will be returned as result.
-
-
- 132
-
-
-
- #let s = stream_of_string "toto 1";;
- s : char stream = <abstr>
-
- #ident 0 s;;
- - : string = "toto"
-
- #(* Let us see what remains in the stream *)
- #match s with [< 'c >] -> c;;
- - : char = ` `
-
- #let s = stream_of_string "LongIdentifier ";;
- s : char stream = <abstr>
-
- #ident 0 s;;
- - : string = "LongIden"
-
- #match s with [< 'c >] -> c;;
- - : char = ` `
-
- The definitions of the new token type and of the lexical analyzer is
- trivial, and we shall omit them. A slightly more complex lexical analyzer
- recognizing identifiers (lowercase only) is given in section 12.2.1 in this
- part.
-
- 11.1, p. 83
-
- (* main.ml *)
- let chars = counter__new 0;;
- let lines = counter__new 0;;
-
- let count_file filename =
- let in_chan = open_in filename in
- try
- while true do
- let c = input_char in_chan in
- counter__incr chars;
- if c = `\n` then counter__incr lines
- done
- with End_of_file ->
- close_in in_chan
- ;;
-
- for i = 1 to vect_length sys__command_line - 1 do
- count_file sys__command_line.(i)
- done;
- print_int (counter__read chars);
- print_string " characters, ";
- print_int (counter__read lines);
- print_string " lines.\n";
- exit 0;;
-
-
-
- 133
-
-
-
-
-
-
-
-
-
- Chapter 18
-
-
-
- Conclusions and further reading
-
-
-
- We have not been exhaustive in the description of the Caml Light features.
- We only introduced general concepts in functional programming, and we have
- insisted on the features used in the prototyping of ASL: a tiny model of
- Caml Light typing and semantics.
- The reference manual [20] provides an exhaustive description of the Caml
- Light language, its libraries, commands and extensions.
- Those who read French are referred to [38], a progressive, but thorough
- introduction to programming in Caml Light, with many interesting examples,
- and to [21], the French edition of the Caml Light reference manual.
- Description about ``Caml Strong'' and useful information about programming
- in Caml can be found in [9] and [37].
- An introduction to lambda-calculus and type systems can be found in [17],
- [12] and [4].
- The description of the implementation of call-by-value functional
- programming languages can be found in [19].
- The implementation of lazy functional languages is described in [29]
- (translated in French as [30]). An introduction to programming in lazy
- functional languages can be found in [5].
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 134
-
-
-
-
-
-
-
-
-
-
- Bibliography
-
-
-
- [1] H. Abelson and G.J. Sussman. Structure and Interpretation of Computer
- Programs. The MIT Press, 1985.
-
- [2] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques
- and Tools. Addison Wesley, 1986.
-
- [3] J. Backus. Can programming be liberated from the Von Neumann style? A
- functional style and its algebra of programs. In Communications of the
- ACM, volume 21, pages 133--140, 1978.
-
- [4] H.P. Barendregt. The Lambda-Calculus: Its Syntax and Semantics,
- volume 103 of Studies in Logic and the Foundations of Mathematics.
- North-Holland, 1984.
-
- [5] R. Bird and P. Wadler. Introduction to Functional Programming. Series
- in Computer Science. Prentice-Hall International, 1986.
-
- [6] A. Church. The Calculi of Lambda Conversion. Princeton University
- Press, 1941.
-
- [7] D. Clement, J. Despeyroux, T. Despeyroux, and G. Kahn. A simple
- applicative language: Mini-ML. In Proceedings of the ACM
- International Conference on Lisp and Functional Programming, pages
- 13--27, 1986.
-
- [8] G. Cousineau, P.-L. Curien, and M. Mauny. The categorical abstract
- machine. In Functional Programming Languages and Computer
- Architecture, number 201 in Lecture Notes in Computer Science, pages
- 50--64. Springer Verlag, 1985.
-
- [9] G. Cousineau and G. Huet. The CAML primer. Technical Report 122,
- INRIA, 1990.
-
- [10] N. De Bruijn. Lambda-calculus notation with nameless dummies, a tool
- for automatic formula manipulation. Indag. Math., 1962.
-
- [11] M. Gordon, R. Milner, L. Morris, M. Newey, and C. Wadworth. A
- metalanguage for interactive proofs in LCF. In Proceedings of ACM
- Symposium on Principles of Programming Languages, pages 119--130,
- 1978.
-
-
- 135
-
-
-
- [12] J.R. Hindley and J.P. Seldin. Introduction to Combinators and
- l-calculus. London Mathematical Society, Student Texts. Cambridge
- University Press, 1986.
-
- [13] C.A.R. Hoare. Quicksort. Computer Journal, 5(1), 1962.
-
- [14] W.A. Howard. The formulae-as-type notion of construction, pages
- 479--490. Academic Press, 1980.
-
- [15] P. Hudak and P. Wadler. Report on the programming language Haskell.
- Technical Report YALEU/DCS/RR-777, Yale University, 1990.
-
- [16] T. Johnsson. Efficient compilation of lazy evaluation. In Proceedings
- of ACM Conference on Compiler Construction, pages 58--69, 1984.
-
- [17] J.-L. Krivine. Lambda-calcul, types et modeles. Etudes et recherches
- en informatique. Masson, 1990.
-
- [18] P. Landin. The next 700 programming languages. In Communications of
- the ACM, volume 9, pages 157--164, 1966.
-
- [19] X. Leroy. The ZINC experiment: an economical implementation of the ML
- language. Technical Report 117, INRIA, 1990.
-
- [20] X. Leroy. The Caml Light system, release 0.6 --- Documentation and
- user's manual. Technical report, INRIA, 1993. Included in the Caml
- Light 0.6 distribution.
-
- [21] X. Leroy and P. Weis. Manuel de reference du langage Caml.
- InterEditions, 1993.
-
- [22] J. MacCarthy. Lisp 1.5 Programmer's Manual. MIT Press, Cambridge,
- Mass., 1962.
-
- [23] D. MacQueen. Modules for Standard ML. In Proceedings of the ACM
- Conference on Lisp and Functional Programming, 1984.
-
- [24] M. Mauny. Functional programming using CAML. Technical Report 129,
- INRIA, 1991.
-
- [25] M. Mauny and D. de Rauglaudre. Parsers in ML. In Proceedings of the
- ACM Conference on Lisp and Functional Programming, 1992.
-
- [26] M. Mauny and A. Suarez. Implementing functional languages in the
- categorical abstract machine. In Proceedings of the ACM International
- Conference on Lisp and Functional Programming, pages 266--278, 1986.
-
- [27] R. Milner. A theory of type polymorphism in programming. J. Comput.
- Syst. Sci., 17:348--375, 1978.
-
- [28] R. Milner. A proposal for Standard ML. In Proceedings of the ACM
-
-
- 136
-
-
-
- Conference on Lisp and Functional Programming, 1987.
-
- [29] S.L. Peyton-Jones. The Implementation of Functional Programming
- Languages. Series in Computer Science. Prentice-Hall International,
- 1987.
-
- [30] S.L. Peyton-Jones. Mise en uvre des langages fonctionnels de
- programmation. Manuels Informatiques Masson. Masson, 1990.
-
- [31] J. Rees and W. Clinger. The revised3 report on the algorithmic
- language Scheme. In SIGPLAN Notices, volume 21, 1987.
-
- [32] J. A. Robinson. Computational logic: the unification computation. In
- Machine Intelligence, volume 6 of American Elsevier. B. Meltzer and D.
- Mitchie (Eds), 1971.
-
- [33] R. Sedgewick. Algorithms. Addison Wesley, 1988.
-
- [34] D. Turner. SASL language manual. Technical report, St Andrews
- University, 1976.
-
- [35] D. Turner. Recursion equations as a programming language, pages 1--28.
- Cambridge University Press, 1982.
-
- [36] D. Turner. Miranda: a non-strict functional language with polymorphic
- types. In Functional Programming Languages and Computer Architecture,
- number 201 in Lecture Notes in Computer Science, pages 1--16. Springer
- Verlag, 1985.
-
- [37] P. Weis, M.V. Aponte, A. Laville, M. Mauny, and A. Suarez. The CAML
- reference manual. Technical Report 121, INRIA, 1990.
-
- [38] P. Weis and X. Leroy. Le langage Caml. InterEditions, 1993.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 137
-